RESEARCH REPORT
An annual synthesis of the most significant developments, evidence, and emerging challenges in clinical AI.
Loading presentation...
The State of Clinical AI Report is the inaugural annual synthesis of the most significant developments, evidence, and emerging challenges in clinical AI. It brings together a comprehensive, carefully curated view of where the field meaningfully advanced this year—spanning model performance, evaluation, workflows, and patient-facing tools— while also highlighting the gaps that remain.
Produced by ARISE (AI Research and Science Evaluation), a Stanford-Harvard Research Network, the aim is to make the landscape easier to navigate, support responsible adoption, and offer a shared reference point for clinicians, researchers, and health leaders as the field continues to evolve.
Supported by






Takeaways from the report
However, the "jagged frontier" exists. While some models show superhuman capabilities on controlled tasks, brittleness remains in identifying their own uncertainty.
With the saturation of traditional QA benchmark scores, there is a need for benchmarking on multi-turn unstructured real-world data, real-world consequences of model errors, and emphasis on administrative/workflow tasks.
Tokenized medical events, multi-agent orchestration, multimodal models, and reasoning fine-tuning are enabling advances in disease prediction and diagnosis, but can also introduce system-level design trade-offs.
While humans + AI often outperform humans alone, there is much room for improvement on workflow design and failure mode training to optimize success while mitigating automation bias and deskilling.
Measurable outcome improvements and safeguards against harm should be prioritized across use cases such as history taking, coaching, and translation. Patients cannot be assumed to play any oversight role.
Research in model capability is dense, with studies across multiple medical specialities routinely showing incremental task-specific improvements - randomized prospective trials have already commenced and should be the next wave of evidence in 2026.
Evaluate models using prospective and post-deployment real-world scenarios to yield evidence based medicine
Prioritize human computer interaction design in clinical decision support trials as much as primary outcomes
Innovate human–AI or agentic AI workflows to reduce clinical and administrative burden
Measure uncertainty, bias, and harm explicitly especially when it comes to patient-facing AI
There is a need for claim-level grounding and verification of reasoning traces - measuring support, not fluency will enable increased user trust
Peter Brodeur, Ethan Goh, Adam Rodman, Jonathan Chen
We would like to thank the following reviewers and designers for generously providing feedback: Emily Tat, Liam McCoy, David Wu, Priyank Jain, Rebecca Handler, Jason Hom, Laura Zwaan, Vishnu Ravi, Brian Han, Kevin Schulman, Kathleen Lacar, Kameron Black, Adi Badhwar, Adrian Haimovich, Eric Horvitz