Author: Jeffrey D. Bradley, Steven J. Feigenberg, Cole Friedes, Yin Gao, Xun Jia, Kevin Teo, Lingshu Yin, Jennifer Wei Zou 👨🔬
Affiliation: Department of Radiation Oncology, University of Pennsylvania, Johns Hopkins University 🌍
Purpose: Understanding how physicians evaluate plans is critical for automatic planning and ensuring consistent, high-quality care. While deep-learning models excel in complex decision-making, the lack of transparency raises safety concerns, making interpretability essential. This study develops a novel Explainable AI (XAI) for lung radiotherapy to reveal how physicians assess plan acceptability and to suggest improvements, using generative adversarial network (GAN) with attention gates (AG).
Methods: XAI takes PTV and OAR masks along with dose for evaluation. GAN framework includes a primary plan evaluation task, which predicts the probability of dose approval, and an auxiliary dose prediction task, which generates additional training data for the main task. Both tasks are implemented with two independent networks with AG. For plan evaluation, AG captures features spatially, localizing attention to key regions for decision-making. For interpretability, attention maps are generated to visualize attentive regions, and t-SNE is applied to understand discriminative features in a lower-dimensional space. The classification performance of the plan evaluation task is assessed using ROC analysis and confusion matrices. Dose prediction is evaluated by comparing predictions with ground-truth dosimetric parameters. XAI was developed using data from 80 advanced lung cancer patients treated with 60/66/70Gy (2Gy/fraction) using VMAT, with 5-fold cross-validation. Additionally, 10 patients treated by another physician were evaluated to interpret inter-physician variations.
Results: AG improves prediction accuracy and performance, with attention maps aligning with human intuition. Using t-SNE, AG further separates feature distributions between the two classes in feature space. Cross-validation shows accuracy, sensitivity, specificity, and AUC of 0.86±0.05, 0.93±0.04, 0.79±0.04, and 0.92±0.03. XAI also generalizes well across physicians, achieving a classification accuracy of 0.83.
Conclusion: XAI with AG effectively evaluates lung plans with high accuracy and interpretability, demonstrating potential for quality assurance and automatic planning. Attention maps and t-SNE improve explainability of XAI’s decision-making, boosting trust in AI-driven evaluations.