Author: Zong Fan, Fan Lam, Hua Li, Rita Huan-Ting Peng, Yuan Yang π¨βπ¬
Affiliation: University of Illinois at Urbana Champaign, University of Illinois at Urbana-Champaign, Washington University School of Medicine, University of Illinois Urbana-Champaign π
Purpose: Accurate lesion segmentation in MRI is critical for early diagnosis, treatment planning, and monitoring disease progression in various neurological disorders. Cross-site MRI data can alleviate data scarcity and improve deep-learning model training performance, but it can also introduce domain variability, lesion diversity, and image heterogeneity, making model generalization difficult. State-of-the-art foundation models, a specific type of deep-learning model, can be trained on diverse and extensive datasets to capture a wide range of knowledge and be flexibly adapted to various tasks through fine-tuning with limited labeled samples. In this study, we propose a foundation model that leverages multimodal and diverse data and integrates self-supervised learning mechanisms to improve lesion segmentation performance and model generalization.
Methods: The proposed framework consists of (1) a foundation encoder built on the Vision Transformer and trained using self-distillation-with-no-labels (DINO) self-supervised learning on a large-scale set of unlabeled multimodal MRI scans to learn generalized feature representations, and (2) a segmentation decoder that can be fine-tuned with limited labeled data using supervised learning. By integrating features from various MRI modalities through self-supervised learning, the model improves segmentation accuracy while addressing challenges posed by lesion heterogeneity and limited annotated data.
Results: We investigate the modelβs performance on two different groups of post-traumatic epilepsy and stroke MRI images, each including T1-weighted, Diffusion-Weighted, and FLAIR images to capture pathological variations. Our model outperformed two widely used segmentation networks, UNet++ and TransUNet, in both studies. For stroke lesion segmentation, the model achieves a mean Intersection over Union (mIoU) of 86.48% and a mean Dice coefficient (mDice) of 92.14%, while attaining mIoU of 48.1% and mDice of 65.2% for PTE lesion segmentation.
Conclusion: The designed foundation model-based approach mitigates the reliance on large-scale labeled datasets, demonstrating model generalization and strong potential for improving lesion segmentation, especially with small and diverse datasets.