Author: Hania A. Al-Hallaq, Xuxin Chen, Anees H. Dhabaan, Elahheh (Ella) Salari, Xiaofeng Yang 👨🔬
Affiliation: Emory University, Department of Radiation Oncology and Winship Cancer Institute, Emory University 🌍
Purpose:
Radiomics image analysis could lead to the development of predictive signatures and personalized radiotherapy treatments. However, variations in delineation are known to affect hand-crafted radiomics features, potentially undermining the reliability of the analysis. This study is the first to investigate how contouring variability impacts deep learning (DL) features extracted from computed tomography (CT) scans.
Methods:
This study introduces a machine-learning framework for automatic image interpretation that uses DL features. A dataset of 627 Head and Neck Squamous Cell Carcinoma (HNSCC) cases was used from the HNSCC online database. Margins of ±2 mm and ±3 mm were applied to the original tumor contours to generate variations. Pre-trained convolutional neural network models, ResNet50 and VGG16, were employed to extract DL features from each contour. Dimensionality reduction was performed using recursive feature elimination with cross-validation (RFE-CV) and random forest classifier. RFE-CV was repeated five times to identify the most frequently selected features. A random forest classification (RFC) model was developed based on the selected features. 10-fold cross-validation was applied to tune hyperparameters and evaluate the models. The AUC values from each variation were compared using the Kruskal-Wallis test.
Results:
The AUC values obtained from the original contours are 0.796±0.051 for ResNet and 0.743±0.07 for VGG16. Both the 2 mm and 3 mm margins resulted in lower AUC values, while the -2 mm and -3 mm margins produced higher AUC values compared to the original contours for the ResNet50 model. Models that use VGG-16 features show even more robust results compared to those using ResNet50 features. However, no statistically significant differences were found (p-value > 0.05).
Conclusion:
Our analysis showed that variations in contouring did not significantly affect the model's performance using DL features, suggesting that DL features exhibit great robustness and consistency in generating reliable outcomes.