Life is short, just have fun. 🤪

💃🏻 About Me

  • 🎓 I am a Ph.D. candidate at the Gaoling School of Artificial Intelligence, Renmin University of China, supervised by Prof. Zhicheng Dou and Prof. Ji-Rong Wen.
  • 🔬 Currently, I am a research intern at Beijing Academy of Artificial Intelligence (BAAI), mentored by Zheng Liu. I sincerely appreciate their meticulous and insightful guidance~
  • 🎓 I received my bachelor’s degree from Nankai University in 2022.
  • Research Interests: Retrieval-augmented generation, Multi-modal retrieval, Long video understanding

📢 News

  • [2025.06]: Starting this September, I’ll be a visiting student at 🇮🇹University of Trento, supervised by Prof. Dr. Nicu Sebe. Looking forward to new collaborations and challenges! 🚀

  • [2025.06]: Released VideoDeepResearch, a novel framework combining a text-only LLM and a multi-modal toolkit to beat SOTA MLLMs. 🔗 Code on GitHub

  • [2025.03]: Released MemVid, a memory-enhanced RAG framework for long video understanding.

  • [2025.03]: Paper OmniGen accepted at CVPR 2025 🎉

  • [2025.02]: Released MomentSeeker, a task-oriented benchmark for long-video moment retrieval.

  • [2024.11]: FineRAG accepted at COLING 2024 🥳

  • [2023.08]: VILE accepted at CIKM 2023 🥳


🎓 Education

- Renmin University of China (2022.09 – 2027.06 (Expected))

  • Ph.D. in Artificial Intelligence, Supervisor: Prof. Zhicheng Dou, Prof. Ji-Rong Wen
  • GPA: 3.87/4.0, Key Courses: Intelligent Information Retrieval (A), Machine Learning (A)

- Nankai University (2018.09 – 2022.06)

  • B.S. in Computer Science, GPA: 91.64/100, Rank: 2/116
  • CET-6: 538, CET-4: 585

📚 Publications

- VideoDeepResearch: Long Video Understanding With Agentic Tool Using | Paper | GitHub | Preprint

H Yuan, Z Liu, J Zhou, H Qian, JR Wen, Z Dou

Proposes an agentic framework for long video understanding that leverages a text-only large reasoning model combined with a modular multi-modal toolkit. Achieves superior performance over state-of-the-art open-source and proprietary MLLMs, including GPT-4o, Qwen2.5-VL, and others, across a wide range of long video understanding benchmarks.

- MemVid: Memory-enhanced Retrieval Augmentation for Long Video Understanding | Paper | Preprint

H Yuan, Z Liu, M Qin, H Qian, Y Shu, Z Dou, JR Wen

Tackles query-less long-video understanding with a memorizing-reasoning-retrieving-focusing pipeline inspired by human memory.


- MomentSeeker: A Comprehensive Benchmark for Long Video Moment Retrieval | Paper | Preprint

H Yuan, J Ni, Y Wang, J Zhou, Z Liang, Z Liu, Z Cao, Z Dou, JR Wen

Introduces a benchmark with 500s+ videos and diverse tasks; includes an MLLM retriever fine-tuned on synthetic data.


- OmniGen: Unified Image Generation | Paper | CVPR 2025

S Xiao, Y Wang, J Zhou, H Yuan, X Xing, R Yan, S Wang, T Huang, Z Liu

A unified diffusion model that handles text-to-image, image editing, and conditional generation via a simple E2E architecture.


- FineRAG: Fine-grained Retrieval-Augmented Text-to-Image Generation | Paper | COLING 2024

H Yuan, Z Zhao, S Wang, S Xiao, M Ni, Z Liu, Z Dou

Breaks the RAG pipeline into 4 stages: query decomposition, candidate selection, retrieval-augmented diffusion, and self-reflection.


- VILE: Block-Aware Visual Enhanced Document Retrieval | Paper | CIKM 2023

H Yuan, Z Dou, Y Zhou, Y Guo, JR Wen

Proposes a dense retrieval model that fuses visual and textual signals to improve web page understanding.


💼 Experiences

  • 2024.12 – Present, Research Intern, BAAI
    Supervised by Zheng Liu

  • 2024.02 – 2024.04, Research Intern, Microsoft Research Asia
    Supervised by Chenfei Wu & Nan Duan


🏆 Competition

  • 🥈 ICPC Asia Shenyang – Silver Medal
  • 🥉 ICPC Asia Kunming – Bronze Medal
  • 🥈 Mathematical Modeling Contest – National Second Prize

🎖 Scholarships

  • 🥇 First-Class Scholarship, Gaoling School of AI – 2022.12
  • 🥇 First-Class Scholarship (Top 5%), Nankai University – 2021.12, 2020.12
  • 🏅 National Scholarship (Top 1.2%), Nankai University – 2019.12