Evaluating Uncertainty Estimation Models for Clinical Integration of AI-Generated Radiotherapy Dose Distributions 📝

Author: Jacob S. Buatti, Kristen A. Duke, Malena Fassnacht, Neil A. Kirby, Parker New, Niko Papanikolaou, Arkajyoti Roy, Yuqing Xia, Michelle de Oliveira 👨‍🔬

Affiliation: The University of Texas San Antonio, UT Southwestern Medical Center, UT Health San Antonio 🌍

Abstract:

Purpose:
Quantifying and visualizing uncertainty is critical for building clinical trust in AI-generated dose distributions. This study evaluates Monte Carlo Dropout (MCD), Snapshot Ensemble (SE), and Bayesian Neural Network (BNN) methods for generating uncertainty maps in a Patch-GAN model trained to predict dose distributions for head and neck (H&N) radiotherapy. The relationship between predicted uncertainty and observed errors was assessed across three thresholds: low (5%), medium (10%), and high (20%) of the prescription dose.
Methods: A Patch-GAN model was trained on 155 H&N volumetric modulated arc therapy (VMAT) plans prescribed to 6996 cGy to predict dose distributions. Uncertainty maps were generated using MCD (15 samples, 0.4 dropout), SE (14 models), and BNN (variational inference). Uncertainty maps were computed as the standard deviation of predictions, and error maps as the absolute difference between predicted and ground-truth doses. Overlap and Pearson correlation coefficients evaluated relationships between uncertainty and error across thresholds of 3.5 Gy, 7 Gy, and 14 Gy.
Results: At the 7 Gy threshold, the combined approach achieved the highest overlap of 36.19% and Pearson correlation of 0.92. BNN achieved 35.69% overlap and 0.87 correlation, SE showed 17.96% overlap and 0.22 correlation, and MCD exhibited 33.28% overlap and 0.26 correlation. At 3.5 Gy, the combined model showed strong alignment with an overlap of 46.06% and Pearson correlation of 0.89. Although overlap decreased at 14 Gy, the combined model and BNN maintained strong
Conclusion: The combined use of MCD, SE, and BNN provides the most robust alignment between uncertainty and error, achieving the highest overlap and strong positive Pearson correlation across all thresholds. These findings highlight the potential of combining uncertainty estimation methods to enhance clinical trust in AI-generated dose distributions. Future work will focus on refining uncertainty models and integrating them into clinical workflows for real-time error detection.

Back to List