Author: Daniel A. Alexander, Jonathan Baron, Brook Kennedy Byrd, William Ross Green, Bolin Li, Rafe A. McBeth, Abigail Pepin, Steven Philbrook 👨🔬
Affiliation: Department of Radiation Oncology and Applied Sciences, Department of Radiation Oncology, University of Pennsylvania, Thayer School of Engineering, University of Pennsylvania 🌍
Purpose: As accelerated partial breast irradiation (APBI) gains traction, the prospect of a rapid sim-to-completion of treatment workflow is an attractive option for patients. While OAR autocontouring and autoplanning solutions are now commercially available, no commercial solution exists for practice-specific target autocontouring, representing a major hurdle for the effective deployment of a rapid sim-to-start solutions. This study aims to evaluate an nnU-Net model for APBI target segmentation, using consistent, practice-specific training data within the University of Pennsylvania.
Methods: A nnU-net segmentation model was initially trained on 533 previously planned APBI cases. After examining modes of failure, a refined model was trained using a subset of the original training cases, 184/533 (35%), which represented consistent physician practice. After performing this model refinement, a cohort of 48 new APBI cases were selected for testing and scored on a scale of 1-3 for ease of identification of lumpectomy cavities. From the 48 test cases, 15 (31%) cases were scored as 3 for high segmentation difficulty in distinguishing the lumpectomy cavity and excluded. Physician contours were standardized as previously described in the Florence Trial. Finally, quantitative themes were identified for given failure modes amongst the 33 test cases.
Results: While a statistical difference in mean GTV DICE scores existed between cases scored as 1 vs. 2 (GTV DICE Score = 0.81±0.06 vs. 0.71±0.11, T-test p = 0.02), there was no statistical difference in the mean PTV DICE scores between these categories (PTV DICE Score = 0.86±0.06 vs. 0.79±0.08, T-test p = 0.93). Common methods of model failure included over-contouring clips (both within ipsilateral breast as well as contralaterally) and issues discerning dense breast tissue from seroma.
Conclusion: Despite small differences in GTV contours, high PTV DICE scores (>0.80) seen in 68% of the cases concurred with qualitatively high agreement between physician and AI-generated contours.