Author: Eric N Carver, Julia Marks π¨βπ¬
Affiliation: Brown University π
Purpose: The clinical applicability of radiomic features is hindered by challenges in stability and reproducibility. To address this, researchers are establishing image and feature standardizations and research reporting requirements. This study aims to characterize common techniques for feature normalization, selection, and modeling by investigating unique radiomics workflows to understand the impact of each component on overall model performance. Additional novelty comes from replication of study with a data subset, viewing technique robustness to dataset expansion, as would be present in research project progression and attempting to simulate clinical implementation.
Methods: Computed Tomography images from 302 patients with non-small cell lung cancer were investigated by extracting radiomic features from gross tumor volumes using Pyradiomics. Each patientβs features were split into 2 datasets, one being all 302 and the other being a subset, each set was then split into 80% training and 20% testing cohorts, and investigated by varying feature normalization, selection, the number of features, or models. Over 1900 unique workflows were generated with radiomic signature sets (RSSs) derived from each unique workflow. Performance was assessed by using area under the receiver operating characteristic curve (ROC-AUC). Relative performance of the eleven feature normalization techniques was determined by sub-grouping RSSs based on technique employed. Similarly, popular feature selections, and models were systematically analyzed. The optimal techniques were identified based on their consistent ability to generate RSSs with higher performance in both training and testing datasets.
Results: It is important to note that none of the techniques demonstrated clear dominance. Only a handful were reproducibly beneficial on both subset and full dataset.
Conclusion: The results suggest that robustness of techniques performance decreased when simulating natural dataset accrual during potential clinical implementation. Future work to investigate this and work towards establish standards for quality assurance of models will be required.