AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

Shaohao Rui^1,2,3, Kaitao Chen^3,4, Weijie Ma^2,3,4, Xiaosong Wang^2,3,

¹Shanghai Jiao Tong University, ²Shanghai Innovation Institute ³Shanghai AI Lab ⁴Fudan University

Introduction

Recent advances in inference time scaling with extended long chain-of thought have significantly improved the reasoning capabilities of both general and medical large language models (LLMs). However, these models tend to engage in lengthy reasoning processes regardless of the difficulty of the input question, leading to increased inference costs in real-world applications. Therefore, enabling adaptive thinking where models think less for simpler questions and think more for complex ones is critical for the effective use of medical LLMs in practice. Despite its importance, there is a lack of end-to-end approaches designed to enhance the adaptive thinking capabilities of medical LLMs while providing a comprehensive examination of the trade-off between performance and computational cost. To bridge this gap, we propose AdaThink-Med, the first end-to-end framework designed to enhance adaptive thinking ability in medical reasoning models with uncertainty-guided length calibration. AdaThink-Med first generates multiple candidate outputs for each question, evaluates the correctness and uncertainty of each candidate, and then estimates problem difficulty via an uncertainty-guided length calibration module. For outputs with low difficulty and correct answers, the framework penalizes longer reasoning paths; whereas for those with high difficulty and incorrect answers, it encourages extending the chain of thought to explore alternative solutions. On six public medical QA benchmarks, AdaThink-Med achieves up to 6.4× length reduction on average while retaining performance with only minimal degradation. Intriguingly, we observe that AdaThink-Med spontaneously develops two distinct reasoning modes, which we characterize as “non-thinking” and “thinking”, demonstrating the model's ability to suppress redundant reasoning processes dynamically.

Length Reward Hack

We conduct pilot experiments to investigate greedy output length reduction strategies that do not explicitly incentivize incorrect responses. Two representative approaches are Kimi1.5, which enforces uniform length reduction across all samples, and ShortBetter, which drives responses toward the minimal length among correct samples. As shown in (a), the absence of such compensation causes both Kimi1.5 and ShortBetter to collapse in terms of output length and accuracy. We initially suspect that this collapse primarily arises in close-ended tasks, where the presence of candidate answers allows the model to adopt a trivial strategy of directly outputting the final answer without any reasoning. To avoid this shortcut, we adopt an open-ended RL setting following HuatuoGPT-o1, where the model must reason and answer using its world knowledge without being provided candidate options. As shown in (b), even in the open-ended setting, greedy length calibration leads to collapse, with both accuracy and output length dropping sharply and failing to recover. Moreover, we found that the collapsed model no longer produces meaningful answers or follows instructions. We attribute this phenomenon to a form of length reward hack: in its search for shorter responses, the model converges to a minimal-length strategy that maintains apparent accuracy while maximizing length-based rewards. This shortcut undermines the emergence of diverse reasoning patterns and degrades instruction following. To address this issue, we introduce a performance compensation mechanism that penalizes shorter outputs when accuracy degrades. This encourages the model to achieve a more balanced trade-off between performance and efficiency, ultimately leading to stable training dynamics.

AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

Introduction

Length Reward Hack

Main Results

Performance Comparison with Calibration Methods

Emergence of Reasoning Modes

BibTeX