Self-Supervised Vision Transformers for Medical Image Segmentation with Limited Annotations

Priya Patel; Xiaofeng Liu; Thomas Müller

doi:10.55001/faids.v1i1.44

Review Articles

Self-Supervised Vision Transformers for Medical Image Segmentation with Limited Annotations

Priya Patel , Xiaofeng Liu , Thomas Müller

Published: 2026-05-01 DOI: https://doi.org/10.55001/faids.v1i1.44 Vol. 1 No. 1 (2026)

— Views — Downloads

Abstract

Annotating medical images for segmentation is expensive and requires domain expertise. We propose MedSSL-ViT, a self-supervised pre-training framework for Vision Transformers (ViT) tailored to medical imaging. MedSSL-ViT combines masked image modeling with anatomical-aware contrastive learning, leveraging the structured nature of medical images. Pre-trained on 850K unlabeled chest X-rays and CT slices, the model achieves state-of-the-art segmentation performance on four downstream tasks using only 10% of annotations: lung segmentation (Dice: 97.2%), cardiac chamber segmentation (Dice: 93.5%), liver tumor segmentation (Dice: 78.8%), and retinal vessel segmentation (Dice: 82.1%). With just 1% labels, MedSSL-ViT still outperforms fully supervised baselines trained on 100% labels by 2-5% Dice score.

Author Biographies

Priya Patel Department of Biomedical Informatics, Stanford University, Stanford, CA 94305, USA

Priya Patel is an associate professor at Department of Biomedical Informatics, Stanford University, Stanford, CA 94305, USA. Their research focuses on biomedical engineering, with over 31 publications in peer-reviewed journals.
Xiaofeng Liu School of Computer Science, Fudan University, Shanghai 200433, China

Xiaofeng Liu is an associate professor at School of Computer Science, Fudan University, Shanghai 200433, China. Their research focuses on energy systems, with over 70 publications in peer-reviewed journals.
Thomas Müller German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany

Thomas Müller is a research fellow at German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany. Their research focuses on energy systems, with over 71 publications in peer-reviewed journals.

View Fulltext Download PDF

Self-Supervised Vision Transformers for Medical Image Segmentation with Limited Annotations. (2026). Frontiers in Artificial Intelligence and Data Science, 1(1). https://doi.org/10.55001/faids.v1i1.44

Endnote/Zotero/Mendeley (RIS) BibTeX

Priya Patel Department of Biomedical Informatics, Stanford University, Stanford, CA 94305, USA
Xiaofeng Liu School of Computer Science, Fudan University, Shanghai 200433, China
Thomas Müller German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany