Graph-Based Feature Selection to Improve Stability and Reproducibility of CT-Based Radiomics in Head and Neck Squamous Cell Carcinoma: A Cross-Institutional Study

Author: Daria Gaykalova, Ranee Mehra, Jason K Molitoris, Hajar Moradmand, Lei Ren, Amit Sawant, Phuoc Tran 👨‍🔬

Affiliation: University of Maryland School of Medicine, Maryland University Baltimore, University of Maryland, Department of Radiation Oncology, University of Maryland School of Medicine 🌍

Abstract:

Purpose: Radiomics extracts quantitative imaging biomarkers from medical images. However, maintaining the reproducibility and stability of selected features across institutions and parameter settings remains a significant challenge, hindering clinical utility. This study introduces a novel method, Graph-Based Feature Selection (Graph-FS), to improve the stability and cross-institutional reproducibility of radiomic feature selection for head and neck squamous cell carcinoma (HNSCC).

Methods: Graph-FS constructs feature graphs to identify clusters of related radiomic features and selects the most representative ones without relying on labels. Radiomics features (1,648) were extracted from the Gross-Tumor-Volumes (GTV) of 752 HNSCC patients across three institutions. Stability was evaluated using the Jaccard Index (JI), Dice Similarity Coefficient (DSC), and Overlap Percentage (OP) across 36 radiomics parameter settings, varying normalization scales, bin widths, and outlier thresholds. Reproducibility was assessed across institutions using JI. Graph-FS was compared to Lasso, Boruta, Recursive Feature Elimination (RFE), and Minimum Redundancy Maximum Relevance (mRMR). Selected features were used to predict 2-year survival outcomes with Random Forest, XGBoost, and CatBoost using 10-fold, 10-repeated cross-validation.

Results: Graph-FS demonstrated higher stability across diverse parameter settings and institutions with a JI of 0.46, DSC of 0.62, and OP of 45.8%. In contrast, traditional methods exhibited significantly lower stability, with Boruta (JI = 0.005), Lasso (JI = 0.010), RFE-RF (JI = 0.006), and mRMR (JI = 0.014). For 2-year survival prediction, Graph-FS outperformed all methods, achieving the highest area under the curve (AUC) of 0.71 on an independent center using CatBoost. Among other approaches evaluated, Boruta performed best, achieving an AUC of 0.62 on the independent center.

Conclusion: Graph-FS provides a robust and innovative approach to addressing reproducibility and stability challenges in radiomic feature selection. Its ability to consistently identify reliable features across diverse settings and institutions supports the clinical translation of radiomics for HNSCC and potentially other cancer types.

Graph-Based Feature Selection to Improve Stability and Reproducibility of CT-Based Radiomics in Head and Neck Squamous Cell Carcinoma: A Cross-Institutional Study 📝

Abstract: