Life is short, just have fun. 🤪

💃🏻 About Me

🎓 I am a Ph.D. candidate at the Gaoling School of Artificial Intelligence, Renmin University of China, supervised by Prof. Zhicheng Dou and Prof. Ji-Rong Wen.
🔬 Currently, I am a research intern at Beijing Academy of Artificial Intelligence (BAAI), mentored by Zheng Liu. I sincerely appreciate their meticulous and insightful guidance~
🎓 I received my bachelor’s degree from Nankai University in 2022.
Research Interests: Retrieval-augmented generation, Multi-modal retrieval, Long video understanding

📢 News

[2025.09]: Paper MomentSeeker accepted at NIPS 2025 🎉🎉 Thanks to all co-authors!!
[2025.06]: Starting this September, I’ll be a visiting student at 🇮🇹University of Trento, supervised by Prof. Dr. Nicu Sebe. Looking forward to new collaborations and challenges! 🚀
[2025.06]: Released VideoDeepResearch, a novel framework combining a text-only LLM and a multi-modal toolkit to beat SOTA MLLMs. 🔗 Code on GitHub
[2025.03]: Released MemVid, a memory-enhanced RAG framework for long video understanding.
[2025.03]: Paper OmniGen accepted at CVPR 2025 🎉
[2025.02]: Released MomentSeeker, a task-oriented benchmark for long-video moment retrieval.
[2024.11]: Paper FineRAG accepted at COLING 2024 🥳
[2023.08]: Paper VILE accepted at CIKM 2023 🥳

🎓 Education

- Renmin University of China (2022.09 – 2027.06 (Expected))

Ph.D. in Artificial Intelligence, Supervisor: Prof. Zhicheng Dou, Prof. Ji-Rong Wen
GPA: 3.87/4.0, Key Courses: Intelligent Information Retrieval (A), Machine Learning (A)

- Nankai University (2018.09 – 2022.06)

B.S. in Computer Science, GPA: 91.64/100, Rank: 2/116
CET-6: 538, CET-4: 585

📚 Publications

- VideoExplorer: Think With Videos For Agentic Long-Video Understanding | Paper | GitHub | Preprint

H Yuan, Z Liu, J Zhou, H Qian, JR Wen, Z Dou

Proposes an agentic framework for long video understanding that leverages a text-only large reasoning model combined with a modular multi-modal toolkit. Achieves superior performance over state-of-the-art open-source and proprietary MLLMs, including GPT-4o, Qwen2.5-VL, and others, across a wide range of long video understanding benchmarks.

- MemVid: Memory-enhanced Retrieval Augmentation for Long Video Understanding | Paper | Preprint

H Yuan, Z Liu, M Qin, H Qian, Y Shu, Z Dou, JR Wen

Tackles query-less long-video understanding with a memorizing-reasoning-retrieving-focusing pipeline inspired by human memory.

- MomentSeeker: A Comprehensive Benchmark for Long Video Moment Retrieval | Paper | NIPS 2025

H Yuan, J Ni, Y Wang, J Zhou, Z Liang, Z Liu, Z Cao, Z Dou, JR Wen

Introduces a benchmark designed for long video moment retrieval, featuring diverse tasks and multiple query modalities.

- OmniGen: Unified Image Generation | Paper | CVPR 2025

S Xiao, Y Wang, J Zhou, H Yuan, X Xing, R Yan, S Wang, T Huang, Z Liu

A unified diffusion model that handles text-to-image, image editing, and conditional generation via a simple E2E architecture.

- FineRAG: Fine-grained Retrieval-Augmented Text-to-Image Generation | Paper | COLING 2024

H Yuan, Z Zhao, S Wang, S Xiao, M Ni, Z Liu, Z Dou

Breaks the RAG pipeline into 4 stages: query decomposition, candidate selection, retrieval-augmented diffusion, and self-reflection.

- VILE: Block-Aware Visual Enhanced Document Retrieval | Paper | CIKM 2023

H Yuan, Z Dou, Y Zhou, Y Guo, JR Wen

Proposes a dense retrieval model that fuses visual and textual signals to improve web page understanding.

💼 Experiences

2024.12 – Present, Research Intern, BAAI
Supervised by Zheng Liu
2024.02 – 2024.04, Research Intern, Microsoft Research Asia
Supervised by Chenfei Wu & Nan Duan

🏆 Competition

🥈 ICPC Asia Shenyang – Silver Medal
🥉 ICPC Asia Kunming – Bronze Medal
🥈 Mathematical Modeling Contest – National Second Prize

🎖 Scholarships

🥇 First-Class Scholarship, Gaoling School of AI – 2022.12
🥇 First-Class Scholarship (Top 5%), Nankai University – 2021.12, 2020.12
🏅 National Scholarship (Top 1.2%), Nankai University – 2019.12

HuayingYuan(苑华莹)