Transformer-Based Proton Dose Prediction with and without Diffusion Process 📝

Author: Jing Qian, Brandon Reber, David M. Routman, Satomi Shiraishi 👨‍🔬

Affiliation: Mayo Clinic 🌍

Abstract:

Purpose: The dose distribution in proton radiotherapy (PRT) is characterized by sharp gradients, posing a challenge for machine learning-based dose prediction. While denoising with diffusion processes may enhance spatial resolution, their potential benefits for proton dose prediction remain unexplored. In this study, we utilized a hybrid U-Net Transformer, incorporating and omitting the diffusion process, and evaluated the models' performance on a cohort of head and neck (HN) cancer patients treated with definitive proton radiotherapy.
Methods: A HN cancer cohort of 110 cases treated with definitive PRT at our clinic from 2013-2024 was retrospectively identified and split into training/validation/test sets in a 75/10/25 split. CT, region-of-interest (ROI) masks, and ROI signed distance maps of 128x128 pixels were used for model input. Models were trained using either axial or sagittal slices. A U-Net architecture with a vision transformer bottleneck was trained with and without diffusion. Model performance was evaluated using mean absolute error (MAE), structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), CTV high ΔD95%, spinal cord ΔDmax, and mandible ΔDmax.
Results: The trained models showed similar performance. The diffusion model trained on the axial plane outperformed in high dose regions, but not necessarily in low dose regions. The axial diffusion model had test set MAE 1.506±0.325 Gy, SSIM 0.870±0.031, PSNR 24.815±1.496, CTV high ΔD95% 1.900±1.491 Gy, spinal cord ΔDmax 5.791±4.500 Gy, and mandible ΔDmax 2.312±1.652 Gy. The dose volume inference time with and without diffusion is 4.26s and 0.53s, respectively.
Conclusion: We evaluated the impact of using the diffusion process on proton dose prediction. While it improves the accuracy in high dose regions, the diffusion process increases the inference time. Future work will focus on improving the backbone model and the diffusion process.

Back to List