Muilt-Instance Learning Model with 2D and 3D Features Representation and Transformer-Based Prediction for FDG PET Tumor Chemoradiation Response of La-NSCLC

Author: Stephen R. Bowen, Chunyan Duan, Daniel S. Hippe, Qiantuo Liu, Jiajie Wang, Shouyi Wang, Faisal Yaseen, Han Zhou 👨‍🔬

Affiliation: Tongji University, University of Washington, Department of Radiation Oncology, Fred Hutchinson Cancer Center, University of Washington, Fred Hutchinson Cancer Center, University of Texas at Arlington 🌍

Abstract:

Purpose: Predicting the effects of the spatial-temporal tumor response to chemoradiation can assist in adjusting radiation dose and support clinical decision-making in radiotherapy. A multi-instance learning model integrating multidimensional features and Transformer-based architecture was proposed to classify mid-chemoradiation tumor response, by analyzing pre-treatment FDG PET (PETpre) and radiation dose distribution (Dose) in locally advanced non-small cell lung cancer (LA-NSCLC) patients.
Methods: The PETpre and Dose volumetric data from 23 patients (Clinical trial: NCT02773238) were sliced, processed through masking operations, and fused to generate input image data, which was subsequently import into convolutional networks to obtain combined 2D and 3D multidimensional feature embeddings. These embeddings were then concatenated and designed with residual connections during the Transformer-based process, incorporating a multi-head attention mechanism to update encoded information. Using a multi-instance learning embedding-level approach, bag-level representations were employed to predict mid-treatment response on PETmid at the patient level. Response was defined as a ≥20% decrease in SUVmean from PETpre to PETmid (∆SUVmean=[SUVmean_mid–SUVmean_pre]/SUVmean_pre). The performance of the proposed Transmil2D3D classification model was evaluated via area under the ROC curve (AUC) and accuracy metrics, along with leave-one-out cross-validation strategy. The performance of Transmil2D3D was compared to ablation models (Transmil, Transmil2D and Transmil3D), and classical models (GABMIL, ABMIL and 3D CNN).
Results: Of the 23 patients, 22 demonstrated response on PETmid. The proposed Transmil2D3D model achieved good performance in response prediction (AUC: 0.81; accuracy: 0.70), with AUC values surpassing those of other comparative models. Specifically, Transmil2D3D demonstrated an incremental improvement over the Transmil3D ablation model (AUC: 0.78) and substantial improvements over the other models(AUC 0.36-0.73).
Conclusion: Integration of both 2D and 3D multidimensional imaging features can enhance the accuracy of image representation and recognition. This predictive tumor radiotherapy response model that combines key features analysis for patient-wise response classification offers personalized support for treatment decision-making.

Muilt-Instance Learning Model with 2D and 3D Features Representation and Transformer-Based Prediction for FDG PET Tumor Chemoradiation Response of La-NSCLC 📝

Abstract: