Large Language Model Agents for Automated Radiotherapy Planning: A Knowledge-Enhanced Reinforcement Learning Approach 📝

Author: Hassan Bagher-Ebadian, Anthony J. Doemer, Ryan Hall, Joshua P. Kim, Bing Luo, Benjamin Movsas, Humza Nusrat, Kundan S Thind 👨‍🔬

Affiliation: Department of Physics, Toronto Metropolitan University, Henry Ford Health 🌍

Abstract:

Purpose: This study investigates the development and feasibility of local LLM-based agents to automate radiotherapy treatment planning, aiming to improve planning efficiency and consistency, while preserving patient privacy.
Methods: An LLM agent comprising working memory, an LLM, and tools to interact with a commercial treatment planning system (Eclipse, Varian), was developed. The agent was deployed on a retrospective cohort of 18 prostate cancer patients prescribed 60 Gy in 20 fractions as per the PROFIT clinical trial and was given 10 iterations to optimize each plan. A chain-of-thought prompting approach, incorporating clinical goals, retrieval-augmented generation (RAG), and reinforcement learning (RL) guided the agent's decision-making process. RAG allowed for access to three previous planning iterations for the same patient. RL reward function embedded PROFIT trial requirements, prioritized planning target volume (PTV) coverage, and organ-at-risk (OAR) sparing. Plan quality was assessed using a composite score incorporating PTV coverage, conformity, and OAR dose metrics. Different model sizes (LLaMa 3.1, 8B and 70B parameters) and optimization configurations (No-RAG, RAG, and RAG+RL) were compared.
Results: The larger model (70B) demonstrated a 16.4% (±4.5%) higher final mean score compared to the smaller model (8B) at the final iteration. Among optimization configurations, RAG achieved the highest overall plan quality, with a final score 11.8% (±2.2%) higher than No-RAG. RAG+RL demonstrated faster convergence than RAG alone, highlighting the synergistic effect of combining retrieval-based memory and reinforcement learning.
Conclusion: This work presents the first successful implementation of local LLM agents for privacy centric autonomous treatment plan optimization in a commercially available TPS. The integration of RAG and RL, coupled with the chain-of-thought prompting, enables the LLM agent to learn from past experiences and optimize treatment plans effectively. Further research is warranted to validate this approach in larger patient cohorts and explore its generalizability to other tumor sites and treatment modalities.

Back to List