Rongsheng Wang

The Chinese University of Hong Kong, Shenzhen Ph.D., The Chinese University of Hong Kong, Shenzhen (2025)

I am Rongsheng Wang (WángRóngShèng), currently pursuing my PhD at The Chinese University of Hong Kong, Shenzhen. My primary research interests lie in Large Language Models (LLMs) and Multimodal LLMs (MLLMs). I love open source and sharing useful knowledge with everyone. I have been featured in China-Ranking and recognized as an outstanding individual developer on GitHub in China.

I actively contribute to the open-source community on 👾GitHub GitHub User's stars, where I’ve led or contributed to several notable projects, including ChatPaper GitHub Stars, XrayGLM GitHub Stars, Awesome-LLM-Resources GitHub Stars, and TinyDeepSeek GitHub Stars. I also share datasets and model weights from these projects on 🤗HuggingFace.

Education
  • The Chinese University of Hong Kong, Shenzhen

    The Chinese University of Hong Kong, Shenzhen

    Ph.D. in Computational Biology and Health Informatics Sep. 2025 - Now

  • Macao Polytechnic University

    Macao Polytechnic University

    M.S. in Big Data and Internet of Things Sep. 2022 - Jul. 2024

  • Henan Polytechnic University

    Henan Polytechnic University

    B.S. in Computer Science (AI) Sep. 2018 - Jul. 2022

Honors & Awards
  • 🥇First Prize of LIC 2025 2025
  • 🎫Kaggle Competitions Expert 2025
  • 🥇Gold Medal of Kaggle AI Mathematical Olympiad - Progress Prize 2 2025
  • 🎫Outstanding Award of JingDong Health - Global AI Innovation Competition 2024
  • 🥇First Prize of Baidu PaddlePaddle AGI Hackathon 2024
  • 🥉Third Prize of DiMTAIC (Organized by JingDong Health) 2023
  • 🥉Third Prize of Baichuan Intelligence and Amazon Cloud AGI Hackathon 2023
  • 🥈Silver Medal of Kaggle RSNA Screening Mammography Cancer Detection 2023
  • 🎫Outstanding Award of IEEE UV 2022 Object Detection Challenge 2022
Experience
  • CUHK (SZ)

    CUHK (SZ)

    Research Assistant (Supervisor is Benyou Wang) Sep. 2024 - Sep. 2025

  • Qiyuan.Tech

    Qiyuan.Tech

    CTO Oct. 2023 - Now

  • HKUST (GZ)

    HKUST (GZ)

    Research Assistant (Supervisor is Yun Bai and Xiang Liu) Feb. 2024 - May 2024

News
2025
🎉 Now you can track the most popular AI papers of the day on your phone through our website: Link
Oct 25
🎉 Two papers accepted by ACL 2025, congrats to all co-authors!
May 16
🎉 Our team won the gold medal in the AIMO-2 competition, ranking 14th out of 2213! Competition Link
Apr 23
Selected Publications (view all )
Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities
Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

Ziyi Zeng, Zhenyang Cai, Yixi Cai, Xidong Wang, Junying Chen, Rongsheng Wang, Yipeng Liu, Siqi Cai, Benyou Wang†, Zhiguo Zhang, Haizhou Li(† corresponding author)

arXiv 2025 Conference

We introduce WaveMind, a multimodal large language model that unifies EEG and paired modalities in a shared semantic space for generalized, conversational brain-signal interpretation.

Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities
Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

Ziyi Zeng, Zhenyang Cai, Yixi Cai, Xidong Wang, Junying Chen, Rongsheng Wang, Yipeng Liu, Siqi Cai, Benyou Wang†, Zhiguo Zhang, Haizhou Li(† corresponding author)

arXiv 2025 Conference

We introduce WaveMind, a multimodal large language model that unifies EEG and paired modalities in a shared semantic space for generalized, conversational brain-signal interpretation.

Towards Multimodal LLMs for Traditional Chinese Medicine
Towards Multimodal LLMs for Traditional Chinese Medicine

Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang†(† corresponding author)

arXiv 2025 Conference

We introduce ShizhenGPT, the first multimodal LLM tailored for Traditional Chinese Medicine, designed to overcome data scarcity and enable holistic perception across text, images, audio, and physiological signals for advanced TCM diagnosis and reasoning.

Towards Multimodal LLMs for Traditional Chinese Medicine
Towards Multimodal LLMs for Traditional Chinese Medicine

Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang†(† corresponding author)

arXiv 2025 Conference

We introduce ShizhenGPT, the first multimodal LLM tailored for Traditional Chinese Medicine, designed to overcome data scarcity and enable holistic perception across text, images, audio, and physiological signals for advanced TCM diagnosis and reasoning.

A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Shunian Chen, Hejin Huang, Yexin Liu, Zihan Ye, Pengcheng Chen, Chenghao Zhu, Michael Guan, Rongsheng Wang, Junying Chen, Guanbin Li, Ser-Nam Lim, Harry Yang, Benyou Wang†(† corresponding author)

arXiv 2025 Conference

We introduce TalkVid, a large-scale, high-quality, and demographically diverse video dataset with an accompanying benchmark that enables more robust, fair, and generalizable audio-driven talking head synthesis.

A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Shunian Chen, Hejin Huang, Yexin Liu, Zihan Ye, Pengcheng Chen, Chenghao Zhu, Michael Guan, Rongsheng Wang, Junying Chen, Guanbin Li, Ser-Nam Lim, Harry Yang, Benyou Wang†(† corresponding author)

arXiv 2025 Conference

We introduce TalkVid, a large-scale, high-quality, and demographically diverse video dataset with an accompanying benchmark that enables more robust, fair, and generalizable audio-driven talking head synthesis.

Dual Retrieving and Ranking Medical Large Language Model with Retrieval Augmented Generation
Dual Retrieving and Ranking Medical Large Language Model with Retrieval Augmented Generation

Qimin Yang, Huan Zuo, Runqi Su, Hanyinghong Su, Tangyi Zeng, Huimei Zhou, Rongsheng Wang, Jiexin Chen, Yijun Lin, Zhiyi Chen, Tao Tan†(† corresponding author)

Scientific Reports 2025 Journal

We proposes a two-step retrieval-augmented generation framework combining embedding search and Elasticsearch with ColBERTv2 ranking, achieving a 10% accuracy boost in complex medical queries while addressing real-time deployment challenges.

Dual Retrieving and Ranking Medical Large Language Model with Retrieval Augmented Generation
Dual Retrieving and Ranking Medical Large Language Model with Retrieval Augmented Generation

Qimin Yang, Huan Zuo, Runqi Su, Hanyinghong Su, Tangyi Zeng, Huimei Zhou, Rongsheng Wang, Jiexin Chen, Yijun Lin, Zhiyi Chen, Tao Tan†(† corresponding author)

Scientific Reports 2025 Journal

We proposes a two-step retrieval-augmented generation framework combining embedding search and Elasticsearch with ColBERTv2 ranking, achieving a 10% accuracy boost in complex medical queries while addressing real-time deployment challenges.

Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

Rongsheng Wang, Junying Chen, Ke Ji, Zhenyang Cai, Shunian Chen, Yunjin Yang, Benyou Wang†(† corresponding author)

arXiv 2025 Conference

We introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset designed for medical video generation. Comprising over 55,000 curated clips from real-world clinical scenarios, it addresses the critical need for both visual fidelity and medical accuracy in applications such as training, education, and simulation.

Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

Rongsheng Wang, Junying Chen, Ke Ji, Zhenyang Cai, Shunian Chen, Yunjin Yang, Benyou Wang†(† corresponding author)

arXiv 2025 Conference

We introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset designed for medical video generation. Comprising over 55,000 curated clips from real-world clinical scenarios, it addresses the critical need for both visual fidelity and medical accuracy in applications such as training, education, and simulation.

All publications