Stanford Researchers Build SleepFM Clinical: A Multimodal Sleep Foundation AI Model for 130+ Disease Prediction

by
0 comments
Stanford Researchers Build SleepFM Clinical: A Multimodal Sleep Foundation AI Model for 130+ Disease Prediction

A team of Stanford Medicine researchers has introduced SleepFM Clinical, a multimodal sleep foundation model that learns from clinical polysomnography and predicts long-term disease risk from a single night of sleep. The research work has been published in Nature Medicine and the team has released the clinical code as open source sleepfm-clinical Repository on GitHub under MIT license.

From overnight polysomnography to general representation

Polysomnography records brain activity, eye movements, heart signals, muscle tone, breathing efforts, and oxygen saturation throughout the night in a sleep laboratory. It is the gold standard test in sleep medicine, but most clinical workflows use it only for sleep staging and sleep apnea diagnosis. The research team treats these multichannel signals as a dense physiological time series and trains a foundation model to learn the shared representations across modalities.

SleepFM has been trained on approximately 585,000 hours of sleep recordings from approximately 65,000 people taken from multiple groups. The largest group comes from the Stanford Sleep Medicine Center, where nearly 35,000 adults and children were studied overnight between 1999 and 2024. That clinical group is linked to the electronic health record, which subsequently enables survival analysis for hundreds of disease categories.

https://www.nature.com/articles/s41591-025-04133-4

Model architecture and pre-training objectives

At the modeling level, SleepFM uses a convolutional backbone to extract local features from each channel, followed by attention-based aggregation across channels and a temporal transformer that operates on short segments of the night. The same core architecture has already appeared in earlier work on SleepFM for sleep staging and sleep disordered breathing detection, where it was found that learning joint embeddings in brain activity, electrocardiography and respiration signals improves downstream performance.

The purpose of pre-training is to leave a contradictory learning. For each small time segment, the model creates separate embeddings for each modality group, such as brain signal, heart signal, and respiratory signal, and then learns to align these modalities so that any subgroup can predict the joint representation of the remaining modalities. This approach makes the model robust to missing channels and anomalous recording montages, which are common in real-world sleep laboratories.

After pre-training on unlabeled polysomnography, the spinal cord is stabilized and small task specific heads are trained. For standard sleep tasks, a lightweight periodic or linear head map embeds sleep stages or apnea labels. To predict clinical risk, the model aggregates entire nights into a single patient-level embedding, adds basic demographics such as age and gender, and then feeds this representation into a Cox proportional hazards layer for time-to-event modeling.

Benchmark on sleep staging and apnea

Before moving on to disease prediction, the research team verified that SleepFM competes with expert models on standard sleep analysis tasks. Earlier work has already shown that a simple classifier On top of SleepFM embeddings, the end-to-end convolutional network outperforms end-to-end sleep disordered breathing detection for sleep stage classification, with gains in macro AUROC and AUPRC on multiple public datasets.

In clinical studies, the same pre-trained backbone is reused for sleep staging and apnea severity classification in multi center groups. The results reported in the research paper show that SleepFM matches or surpasses existing tools such as traditional convolutional models and other automated sleep staging systems, confirming that the representation captures core sleep physiology, not just statistical artifacts from a dataset.

One night’s sleep predicts 130 diseases and mortality rates

The main contribution of this Stanford research paper is disease prediction. The research team mapped diagnosis codes in the Stanford electronic health record to phcodes and defined more than 1,000 candidate disease groups. For each phcode, they calculate the time to first diagnosis after a sleep study and fit a Cox model on top of the SleepFM embedding.

SleepFM identifies 130 disease outcomes whose risks can be predicted from a single night of polysomnography with strong discrimination. These include all-cause mortality, dementia, myocardial infarction, heart failure, chronic kidney disease, stroke, atrial fibrillation, several cancers, and numerous psychiatric and metabolic disorders. For many of these conditions, performance metrics such as concordance index and area under the receiver operating curve are equivalent to established risk scores, even if the model only uses sleep recordings and basic demographics.

The reporting also states that for some cancers, pregnancy complications, circulatory conditions, and mental health disorders, predictions based on SleepFM reach approximately 80 percent accuracy levels for multi-year risk windows. This suggests that subtle patterns in the coordination between brain, heart, and breathing signals convey information about latent disease processes that are not yet clinically visible.

Comparison with simple baselines

To assess the added value, the research team compared the SleepFM-based risk model to two baselines. The first uses only demographic characteristics such as age, sex and body mass index. The second trains an end-to-end model directly on polysomnography and results without supervised pre-training. In most disease categories, the pre-trained SleepFM representation combined with a simple survival head produces higher consensus and higher long-term AUROC than both baselines.

This research clearly shows that the benefits come less from a complex prediction head and more from a foundation model that has learned a general representation of sleep physiology. In practice, this means that clinical centers can reuse a pre-trained backbone, learn small site specific heads with relatively modest labeled clusters and still reach state of the art performance.


check it out paper And full code hereAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.

Check out our latest releases ai2025.devA 2025-focused analytics platform that models launches, benchmarks and transforms ecosystem activity into a structured dataset that you can filter, compare and export


Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

Related Articles

Leave a Comment