I’m now a 2nd-year PhD student at Shanghai Jiao Tong University, Shanghai Innovation Institute, advised by Prof. Xiaosong Wang and Prof. Weijie Ma.

Research Interest: World Models, Video Generation, LLM & VLM Reasoning, 3D Reconstruction, Medical AI.

NOTE: I’m currently working on a startup focusing on glasses-free 3D display technology. Welcome students with backgrounds in 3D reconstruction and interactive world models to apply for internships and collaborate! Please send your resume to hr@lynnreal.com

🔥 News

[2026-07] 🎉 One paper accepted by NPJ Digital Medicine (IF=18.0).
[2026-07] 🎉 One paper accepted by Medical Image Analysis (IF=14.0).
[2026-07] 🎉 One paper accepted by ACM MM 2026.
[2026-06] 🎉 One paper accepted by ECCV 2026.
[2026-06] Open-sourced BiWM, the first bidirectional autoregressive training framework in the field.
[2026-05] 🎉 Two papers accepted by ICML 2026.
[2026-02] 🎉 One paper accepted by CVPR 2026.
[2026-01] 🎉 One paper accepted by ICASSP 2026 as Oral.
[2025-12] 🎉 One paper accepted by TMI (IF=12.4).
[2025-12] Started internship at Shanda AI Tokyo Research Institute.
[2025-02] 🎉 Two papers accepted by MICCAI 2025.
[2025-02] 🎉 One paper accepted by CVPR 2025 as Highlight.
[2024-09] Started joint Ph.D. program at Shanghai Innovation Institute.
[2023-09] Started internship at Shanghai AI Laboratory.

📝 Selected Publications

MedCCO: Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning

Shaohao Rui, Kaitao Chen, Weijie Ma, Xiaosong Wang, ACM MM 2026

[Paper][Code][BibTeX]

We introduce MedCCO, a curriculum-driven reinforcement learning framework that unifies close-ended and open-ended medical VQA to improve robust, clinically relevant multimodal reasoning.

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Xiaofeng Mao*, Shaohao Rui*, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang, ECCV 2026

* Equal contribution.

[Paper][Code][BibTeX]

We present PackForcing, a framework for autoregressive video diffusion that manages generation history with a three-partition KV-cache: sink tokens for global semantics, highly compressed mid tokens (with dynamic top-k selection), and full-resolution recent tokens for local coherence, plus Temporal RoPE adjustment. It enables long coherent video generation with bounded memory—for example ~2-minute 832×480 video at 16 FPS on one H200 with ~4 GB KV cache and strong VBench temporal metrics—using only short-clip supervision.

BiWM: Advancing Open-Source Interactive Video World Models with Bidirectional Autoregression

Shaohao Rui*, Xiaofeng Mao*, Zhanyu Zhang, Peijia Lin, Yansong Zhu, Yibo Zhang, Haibin Wan, Weijie Ma, Technical Report 2026

* Equal contribution.

[Paper][Code][BibTeX]

We present BiWM, an open-source full-stack framework for interactive video world models under the bidirectional autoregressive paradigm, balancing generation quality, controllability, and inference speed.

AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

Shaohao Rui, Kaitao Chen, Weijie Ma, Xiaosong Wang, ICML 2026

[Paper][Code][BibTeX]

We propose AdaThink-Med, an end-to-end framework that improves adaptive thinking in medical reasoning LLMs via uncertainty-guided length calibration—penalizing overly long chains on easy, solved cases while encouraging deeper reasoning on hard ones. On six medical QA benchmarks it cuts average output length by up to 6.4× with only minor accuracy loss and yields emergent “thinking” vs. “non-thinking” modes.

InvCoSS: Inversion-driven Continual Self-supervised Learning in Medical Multi-modal Image Pre-training

Zihao Luo*, Shaohao Rui*, Zhenyu Tang, Guotai Wang, Xiaosong Wang, CVPR 2026

* Equal contribution.

[Paper][Code][BibTeX]

We propose InvCoSS, an inversion-driven continual self-supervised learning framework for medical multi-modal image pre-training. It synthesizes images by inverting prior-stage models—avoiding raw data replay—while mitigating catastrophic forgetting under privacy constraints. We introduce InvUNet for higher-fidelity inversion and repulsive representation learning to improve diversity; experiments on nine downstream tasks show performance comparable to or better than data-replay methods without storing past raw data.

CardioCoT: Hierarchical Reasoning for Multimodal Survival Analysis

Shaohao Rui, Haoyang Su, Jinyi Xiang, Lian-Ming Wu, and Xiaosong Wang, ICASSP 2026 Oral

[Paper][Code][BibTeX]

We propose CardioCoT, a hierarchical reasoning-enhanced survival analysis framework for MACE recurrence risk prediction in AMI patients, leveraging postoperative cardiac MRI and clinical notes. CardioCoT integrates evidence-augmented reasoning with imaging data, achieving superior predictive performance and interpretability to support precision clinical decision-making.

BrainMVP: Multi-modal Vision Pre-training for Medical Image Analysis

Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang, CVPR 2025 Highlight

[Paper][Code][BibTeX]

We introduce the first multi-modal vision pre-training method (BrainMVP) for missing modality medical data. We demonstrate the superior performance and the enhanced generalizability of our BrainMVP pre-trained models on ten public segmentation and classification benchmarks compared to state-of-the-art methods.

🎖 Honors and Awards

2024.06, Outstanding Undergraduate Graduate.
2023.11, National Scholarship.
2023.09, Outstanding Student Award.

📖 Educations

2024.09 - now, PHD, Shanghai Jiao Tong University.

💼 Internships

2025.12 - 2026.04, Shanda AI Tokyo Research Institute, Tokyo, Japan
2023.09 - 2025.11, Shanghai AI Laboratory, Shanghai, China

Shaohao Rui

BibTeX Citation

🔥 News

📝 Selected Publications

🎖 Honors and Awards

📖 Educations

💼 Internships