An Automated Tool for the Categorization of a Clinical Database By Anatomic Region for Big Data Applications πŸ“

Author: Yasin Abdulkadir, Justin Hink, James M. Lamb, Jack Neylon πŸ‘¨β€πŸ”¬

Affiliation: Department of Radiation Oncology, University of California, Los Angeles 🌍

Abstract:

Purpose: Curation remains a significant barrier to the use of β€˜big data’ radiotherapy planning databases of 100,000 patients or more. Anatomic site of treatment is an important stratification for almost all downstream analyses. Currently, automated methods are highly dependent on parsing plan labels or primary target volumes. This is challenging without strict institutional naming conventions, and nearly impossible for broad and diverse multi-institutional data. To address this, we developed in-house software to automate and standardize the labeling of treatment plans by anatomic region.
Methods: Our software processes DICOM files in bulk, applying segmentation models to map 117 structures, including organs, glands, and bones, split into six anatomic groups (brain, head and neck, pelvis, abdomen, thorax, limb) using nomenclature from Task Group 263. RTDose isodoses of 95%, 80%, and 50% were sequentially overlaid onto each organ segmentation until at least one organ achieved a dice score above a minimum threshold. Plans were assigned to anatomical groups based on the top five scores exceeding this threshold. The algorithm was trained on 104 cases with manually curated ground truths and tested on a cohort of 20 consecutive treatment plans from our CT simulation schedule.
Results: The algorithm's performance was evaluated using two metrics: exact matches to the ground truth anatomic groups and partial matches, where the predicted anatomic groups were incomplete but entirely contained within the ground truth. On the test dataset, the algorithm achieved a 90% exact match rate and a 95% partial match rate.
Conclusion: The workflow demonstrated robust performance, labeling treatment plans into six anatomic regions with a 90% exact match rate and a 95% partial match rate on the test dataset, showcasing its potential for clinical application. However, with a current runtime of approximately 500 seconds per plan, further optimization is necessary for the desired scalability of 100,000 patients.

Back to List