Foundation Models with Balanced Data Sampling Enhance Auto-Segmentation for Cardiac Substructures 📝

Author: Chloe Min Seo Choi, Nikhil Mankuzhy, Aneesh Rangnekar, Andreas Rimner, Maria Thor, Harini Veeraraghavan, Abraham Wu 👨‍🔬

Affiliation: Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, Department of Medical Physics, Memorial Sloan Kettering Cancer Center, Memorial Sloan Kettering Cancer Center 🌍

Abstract:

Purpose: Cardiac substructure irradiation predisposes patients for poor outcomes in thoracic radiation therapy. A deep learning model was developed to segment the cardiac substructures invariant to contrast, thoracic tumor sites, and patient positioning.
Methods: The discovery cohort included 240 planning CTs (PCTs) from lung cancer patients scanned in supine position. 180 CTs (Contrast-enhanced (CECT): N=56; Non-CE (NCCT): N=124) were used to train the segmentation model using 3-fold cross validation; remaining 60 (CECT: N=24, NCCT: N=36) scans set-aside for testing. A secondary test cohort of sixty six breast cancer patients PCTs scanned in supine/prone position (N=45/21) was evaluated. A 3D model consisting of a published pretrained transformer encoder with a U-Net style decoder, initialized with random weights, was trained to segment the aorta, pulmonary artery, pulmonary vein, superior vena cava, and the inferior vena cava. Manual delineations by institutional radiation oncologists following predetermined labeling criteria for each cardiac substructures were used. Three configurations, oracle (CECT: N=56 /NCCT: N=124), CECT-only (N=56), and balanced (CECT: N=32/NCCT: N=32) were trained and evaluated using the Dice similarity coefficient (DSC) metric.
Results: Models trained using balanced and oracle configurations demonstrated comparable performance on both, primary (0.80 ± 0.10 and 0.81 ± 0.10) and secondary (0.77 ± 0.13 and 0.80 ± 0.12) cohorts, not statistically significant. In contrast, the model trained solely with CECT performed significantly worse, achieving 0.77 ± 0.12 (p = 0.021) and 0.75 ± 0.15 (p = 0.008) respectively, compared to the oracle configuration.
Conclusion: Our results suggest that robust performance on PCTs can be achieved with fewer training examples (64 versus 180) when balanced proportions are used. This highlights the value of careful data curation to balance imaging variations rather than relying on large or exclusively-CECT datasets. The results on the breast cancer cohort further demonstrate the model's adaptability to diverse clinical scenarios.

Back to List