Author: Jie Deng, Yunxiang Li, Xiao Liang, Weiguo Lu, Jiacheng Xie, You Zhang ðĻâðŽ
Affiliation: Medical Artificial Intelligence and Automation (MAIA) Lab, Department of Radiation Oncology, UT Southwestern Medical Center, Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, UT Southwestern Medical Center, University of Texas Southwestern Medical Center, Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, UT Southwestern Medical Center, The University of Texas at Dallas ð
Purpose: Recently, foundational models trained on large datasets have shown remarkable performance across various tasks. Developing a foundational model for medical image modality translation in head-and-neck radiotherapy enables the inference/prediction of missing imaging modalities (due to hardware limitations, workflow constraints, or medical costs) from available image modalities. It can promote more accurate cross-modality segmentation, multimodal registration, and MR-only treatment planning.
Methods: We propose a foundational model called Translate-Any-Modality (TAM), trained on eight distinct magnetic resonance imaging (MRI) sequences along with the corresponding computed tomography (CT) image for head-and-neck patient cases. TAM is designed for flexible many-to-many modality translation, allowing it to predict any target modality given an arbitrary number of input modalities. For training, TAM also enables flexible inputs without requiring all eight MRI sequences and CT for each patient case, allowing it to use as many training cases as possible. TAM employs a two-stage architecture. In stage I, a 3D UNet performs image segmentation, which offers anatomical clues to preserve the structural integrity during modality translations. In stage II, a diffusion model synthesizes the target modality, using the input modalities and segmentation results from Stage I as conditional guidance. Trained using a comprehensive head-and-neck dataset, TAM was evaluated on 46 modality translation tasks and benchmarked against state-of-the-art methods via metrics including the Peak-Signal-to-Noise-Ratio (PSNR) and 3D gamma index (for dose calculation evaluations).
Results: TAM demonstrated superior performance across all 46 tasks, achieving an average(Âąs.d.) PSNR of 29.64Âą6.13, compared to 25.23Âą14.34 for CycleGAN. For radiotherapy dose evaluation using synthetic CTs generated from MRIs, TAM achieved an average(Âąs.d.) Gamma pass rate of 96.42Âą2.73% under the 1%/1mm criterion, whereas CycleGAN achieved only 92.48Âą6.97%.
Conclusion: TAM enables flexible image modality translations for head-and-neck applications with exceptional performance. Combining physically-acquired and virtually-generated imaging modalities, TAM paves the way for more advanced multimodality image-guided radiotherapy.