Author: Zachary Buchwald, Chih-Wei Chang, Richard L.J. Qiu, Mojtaba Safari, Hui-Kuo Shu, Lisa Sudmeier, Xiaofeng Yang, David Yu, Xiaohan Yuan 👨🔬
Affiliation: Emory University and Winship Cancer Institute, Emory University, Georgia Institute of Technology, Department of Radiation Oncology and Winship Cancer Institute, Emory University 🌍
Purpose: This study proposes a novel vision-language model (VLM) to predict survival outcomes in glioblastoma (GBM) patients. By integrating multimodal MRI data and clinical information, the proposed model aims to improve predictive accuracy and provide insights into prognostic factors, addressing the need for personalized treatment strategies in GBM management.
Methods: The study utilized the publicly available UCSF-PDGM dataset, comprising data from 500 patients. Clinical information, including age, MGMT gene status, gender, pathological diagnosis, and WHO grade, was used to generate text prompts. A Bio-Clinical-BERT pre-trained text encoder extracted features from the clinical data, while vision features were derived from image patches of size (64,64,64) centered around the tumor in four volumetric MRI modalities including ADC, T2-weighted, T1-contrast, and mean diffusivity. We proposed a newly designed VLM by integrating the multimodal MRI data to predict GBM survival outcomes. We first employed the bilinear combination method, which extracts image tokens from each modality and fuses them in the embedding space. Vision and text features were further combined using feature-wise linear modulation (FiLM). The proposed VLM was compared with the baseline vision-only model. The models were trained using 5-fold cross-validation.
Results: The VLM utilizing the bilinear combination and FiLM method achieved significantly higher accuracy and AUC of 70.80%±4.07% and 0.67±0.06, compared to the baseline vision encoder results of 60.80%±0.98% and 0.59±0.02 with p-values of 0.01 and 0.03, respectively. Our approach showed a substantial improvement regarding a 17% enhancement in model accuracy, compared to previous deep learning studies using the same dataset.
Conclusion: The proposed VLM leverages information from heterogeneous data such as MRI and clinical notes to enhance the prediction for GBM patient survival outcomes, compared to the methods that use only image features. The findings highlight the potential of using VLM to enhance survival prediction and support the development of personalized GBM treatment strategies.