Patient-Specific Deep Reinforcement Learning Framework for Automatic Replanning in Proton Therapy for Head-and-Neck Cancer 📝

Author: Malvern Madondo, Mark McDonald, Zhen Tian, Christopher Valdes, Ralph Weichselbaum, Xiaofeng Yang, David Yu, Jun Zhou 👨‍🔬

Affiliation: Department of Radiation & Cellular Oncology, University of Chicago, University of Chicago, Emory University, Department of Radiology, University of Chicago, Department of Radiation Oncology and Winship Cancer Institute, Emory University 🌍

Abstract:

Purpose: Head-and-neck (HN) cancer patients often experience significant anatomical changes during treatment course. Proton therapy, particularly intensity-modulated proton therapy (IMPT), is sensitive to these changes and often requires replanning, which however is resource-intensive and time-consuming. This study introduces a patient-specific deep reinforcement learning (DRL) framework for automatic replanning, ensuring optimized model performance tailored to each patient’s anatomical characteristics.

Methods: DRL agents were employed to interact with a plan optimization engine and learn an optimal priority tuning policy that maximizes cumulative rewards. Dose-volume histograms of clinical target volumes (CTVs) and organs at risk (OARs) were used as the state, and the action space comprised 17 predefined priority adjustments. A 150-point scoring system combining ProKnow criteria and institutional guidance was designed to assess plan quality at every tuning step, with the reward defined as the change in quality score. A retrospective study was conducted on five HN cancer patients treated with IMPT, who required replanning due to anatomical changes. Two DRL agents, Deep Q-Network (DQN) and Proximal Policy Optimization (PPO), were implemented for comparison. Both agents were trained using each patient’s original planning CT images and contours and two augmented datasets simulating tumor regression or progression. The patient’s replanning CT images acquired during treatment course were used to evaluate the performance of both DRL agents.

Results: Initial plans, created with default priority settings, had an average score of 130.41 ± 3.09. Both DQN and PPO agents automatically adjusted priorities, achieving average scores of 143.48 ± 3.65 and 144.85 ± 3.69, respectively, surpassing the plans manually generated by human planners (140.79 ± 4.43).

Conclusion: Our results demonstrated that the DRL-based automatic replanning approach can improve the clinical efficiency of offline adaptive proton therapy while ensuring consistently high plan quality. It also paves the way for the development of online adaptive proton therapy.

Back to List