Life is short, just have fun. 🤪
💃🏻 About Me
- 🎓 I am a Ph.D. candidate at the Gaoling School of Artificial Intelligence, Renmin University of China, supervised by Prof. Zhicheng Dou and Prof. Ji-Rong Wen.
- 🔬 Currently, I am a research intern at Beijing Academy of Artificial Intelligence (BAAI), mentored by Zheng Liu. I sincerely appreciate their meticulous and insightful guidance~
- 🎓 I received my bachelor’s degree from Nankai University in 2022.
- Research Interests: Retrieval-augmented generation, Multi-modal retrieval, Long video understanding
📢 News
-
[2025.06]: Starting this September, I’ll be a visiting student at 🇮🇹University of Trento, supervised by Prof. Dr. Nicu Sebe. Looking forward to new collaborations and challenges! 🚀
-
[2025.06]: Released VideoDeepResearch, a novel framework combining a text-only LLM and a multi-modal toolkit to beat SOTA MLLMs. 🔗 Code on GitHub
-
[2025.03]: Released MemVid, a memory-enhanced RAG framework for long video understanding.
-
[2025.03]: Paper OmniGen accepted at CVPR 2025 🎉
-
[2025.02]: Released MomentSeeker, a task-oriented benchmark for long-video moment retrieval.
-
[2024.11]: FineRAG accepted at COLING 2024 🥳
-
[2023.08]: VILE accepted at CIKM 2023 🥳
🎓 Education
- Renmin University of China (2022.09 – 2027.06 (Expected))
- Ph.D. in Artificial Intelligence, Supervisor: Prof. Zhicheng Dou, Prof. Ji-Rong Wen
- GPA: 3.87/4.0, Key Courses: Intelligent Information Retrieval (A), Machine Learning (A)
- Nankai University (2018.09 – 2022.06)
- B.S. in Computer Science, GPA: 91.64/100, Rank: 2/116
- CET-6: 538, CET-4: 585
📚 Publications
- VideoDeepResearch: Long Video Understanding With Agentic Tool Using | Paper | GitHub | Preprint
H Yuan, Z Liu, J Zhou, H Qian, JR Wen, Z Dou
Proposes an agentic framework for long video understanding that leverages a text-only large reasoning model combined with a modular multi-modal toolkit. Achieves superior performance over state-of-the-art open-source and proprietary MLLMs, including GPT-4o, Qwen2.5-VL, and others, across a wide range of long video understanding benchmarks.
- MemVid: Memory-enhanced Retrieval Augmentation for Long Video Understanding | Paper | Preprint
H Yuan, Z Liu, M Qin, H Qian, Y Shu, Z Dou, JR Wen
Tackles query-less long-video understanding with a memorizing-reasoning-retrieving-focusing pipeline inspired by human memory.
- MomentSeeker: A Comprehensive Benchmark for Long Video Moment Retrieval | Paper | Preprint
H Yuan, J Ni, Y Wang, J Zhou, Z Liang, Z Liu, Z Cao, Z Dou, JR Wen
Introduces a benchmark with 500s+ videos and diverse tasks; includes an MLLM retriever fine-tuned on synthetic data.
- OmniGen: Unified Image Generation | Paper | CVPR 2025
S Xiao, Y Wang, J Zhou, H Yuan, X Xing, R Yan, S Wang, T Huang, Z Liu
A unified diffusion model that handles text-to-image, image editing, and conditional generation via a simple E2E architecture.
- FineRAG: Fine-grained Retrieval-Augmented Text-to-Image Generation | Paper | COLING 2024
H Yuan, Z Zhao, S Wang, S Xiao, M Ni, Z Liu, Z Dou
Breaks the RAG pipeline into 4 stages: query decomposition, candidate selection, retrieval-augmented diffusion, and self-reflection.
- VILE: Block-Aware Visual Enhanced Document Retrieval | Paper | CIKM 2023
H Yuan, Z Dou, Y Zhou, Y Guo, JR Wen
Proposes a dense retrieval model that fuses visual and textual signals to improve web page understanding.
💼 Experiences
-
2024.12 – Present, Research Intern, BAAI
Supervised by Zheng Liu -
2024.02 – 2024.04, Research Intern, Microsoft Research Asia
Supervised by Chenfei Wu & Nan Duan
🏆 Competition
- 🥈 ICPC Asia Shenyang – Silver Medal
- 🥉 ICPC Asia Kunming – Bronze Medal
- 🥈 Mathematical Modeling Contest – National Second Prize
🎖 Scholarships
- 🥇 First-Class Scholarship, Gaoling School of AI – 2022.12
- 🥇 First-Class Scholarship (Top 5%), Nankai University – 2021.12, 2020.12
- 🏅 National Scholarship (Top 1.2%), Nankai University – 2019.12