Author: Stephen R. Bowen, Richard Cheng, Kylie Kang, Janice Kim, Ana Paula Santos Lima, Dominic A. Maes, Juergen Meyer, Karen Ordovas, Kerry Reding 👨🔬
Affiliation: Department of Radiation Oncology, University of Washington, Department of Radiation Oncology, Fred Hutchinson Cancer Center, University of Washington, Department of Radiology, University of Washington, Division of Cardiology, University of Washington, Department of Biobehavioral Nursing and Health Informatics, School of Nursing, University of Washington 🌍
Purpose: Artificial intelligence (AI)-based auto-segmentation tools can increase the efficacy and reproducibility of radiotherapy (RT) treatment planning. This study evaluates the quality of AI-generated cardiac substructures and investigates the dosimetric differences resulting from geometric discrepancies.
Methods: Using LimbusAI-RadFormation, seven cardiac substructures were auto-segmented on 32 CT scans from the UPBEAT breast RT cohort: the heart, anterior aorta, left anterior descending artery (LAD), pulmonary artery, inferior vena cava, superior vena cava, and left ventricle. A cardiothoracic radiologist (human expert) manually performed the same task for comparison and validation. All patients underwent photon-based RT targeting the left breast. Quantitative similarity was assessed using the Dice Similarity Coefficient (DSC). Dosimetric analysis compared minimum, maximum, and mean doses between AI-generated and expert contours using the Wilcoxon signed-rank test.
Results: A total of 215 contours were generated. Median DSC values for AI-generated contours were high for most cardiac substructures, including the heart 1.0 (range:0.9-1.0), anterior aorta 1.0 (range: 0.9-1.0), pulmonary artery 1.0 (range: 0.9-1.0), inferior vena cava 1.0 (range: 0.5-1.0), superior vena cava 1.0 (range: 0.9-1.0), and left ventricle 1.0 (range: 0.9-1.0). Lower median DSC was observed for the LAD 0.4; (range: 0.1–0.6) driven primarily by variations in the superior-inferior extent between AI and expert contours. Interestingly, the dosimetric differences between AI-generated and expert contours were not statistically significant (p > 0.05), even for the LAD across minimum, maximum, and mean dose values.
Conclusion: The AI-generated contours generally showed excellent agreement with those created by a human expert, except for the LAD, where variations in the superior-inferior extent were noted. Despite this, the dosimetric impact of inaccuracies in the AI-generated cardiac substructure contours was negligible, indicating their acceptability for breast RT planning and evaluation. However, the variability in LAD contours warrants further investigation to ensure clinical suitability, particularly when applying LAD specific planning constraints.