Author: Steve B. Jiang, Mu-Han Lin, Dan Nguyen, Beiqian Qi, Daniel Yang, Ying Zhang 👨🔬
Affiliation: Department of Radiation Oncology, UT Southwestern Medical Center, Medical Artificial Intelligence and Automation (MAIA) Lab & Department of Radiation Oncology, UT Southwestern Medical Center 🌍
Purpose:
Online adaptive radiotherapy (oART) is a resource-intensive workflow requiring significant time and effort required from clinicians, particularly for the online evaluation of plan quality. Artificial intelligence (AI) agents offers promising potential in replicating human-like reasoning and decision-making. By harnessing AI, it may be possible to enable automated assessment and comparison of treatment plans, reducing clinical resources need for oART delivery. This study explores the feasibility of using a large language model (LLM) AI agent for oART plan evaluation.
Methods:
Five prostate SBRT patients that underwent oART on Unity MR-Linac were included for analysis, with each patient case including three plans: (1) Clinical oART Plan, (2) Orig-seg Plan: recalculating the reference plan on the daily image without any corrections, and (3) Adapt-seg Plan: recalculating the dose with shift corrections applied. Dosimetric criteria and objective tolerances were provided to a LLM (GPT-4o) for data extraction and comparison. Detailed instructions were included in the prompts, specifying table formatting and comparison logic. The performance of the LLM was evaluated based on its accuracy in extracting and organizing data, and producing valid plan comparison results.
Results:
The AI agent successfully summarized the dosimetric criteria into a table, applied color coding to indicate pass, within tolerance, or violation, and provided reasonable explanations for these violations, as well as a comparative conclusion among the three plans. However, challenges were encountered, including missing information, difficulty understanding the logic, inconsistent color coding, and hallucinations (e.g., generating random content or providing incorrect reasoning).
Conclusion:
This study indicates that, with proper instruction, AI agents can effectively support oART plan evaluation. However, their role as the sole "judge" remains limited due to numerous challenges, emphasizing the need for a robust quality assurance mechanism of AI agents before clinical implementation.