Evaluating Supervised Learning Models for Binary Classification of Radiomic Data in Predicting Head and Neck Cancer Treatment Outcomes πŸ“

Author: Theodore Higgins Arsenault, Kyle O'Carroll, Christian Erik Petersen, Alex T. Price, Meiying Xing πŸ‘¨β€πŸ”¬

Affiliation: University Hospitals Seidman Cancer Center 🌍

Abstract:

Purpose: To assess the performance of various supervised learning models’ ability to predict binary classification of radiomic data for head and neck (H&N) cancer treatment outcomes.
Methods: Using an open-source dataset of 491 radiation therapy patients receiving H&N treatment, feature vectors were implemented which included demographics, tumor staging/category and radiotherapy prescription. These inputs were used to help separately predict: 'Vital status: Alive/Dead', 'Local control: Yes/No', 'Regional control: Yes/No', 'Locoregional control: Yes/No', and 'Relapse-free survival: Yes/No'. Base models including Logistic Regression (LR), Decision Trees (DT), Random Forest (RF), Support vector classifiers (SVC), Gaussian NaΓ―ve Bayes (NB), and K-nearest neighbor (KNN) were used. XGBoost, AdaBoost (AB), and a soft voting classifier (SV) with all evaluated models were also implemented. An 80:20 train-test split, and SMOTE oversampling of the minority class were used to train and balance the datasets. Confusion matrices, accuracy, precision, recall, F1-score, specificity, ROC-Area under curve, and cross validation metrics were used to evaluate all models.
Results: The top four models in terms of average accuracy, precision, recall, F1-score, specificity, and ROC-AUC for the models are given as follows. Accuracy {SV&RF= .810, AB_DT&AB_RF= .802}, Precision {LR= .888, AB_LR= .884, RF&AB_DT=.832}, F1 score { RF= .881,SV=0.879, AB_DT&AB_RF= .877}, Specificity {LR=.6568, KNN=.498, NB=.440, SVC=.401}, ROC AUC {LR=.728, RF=.701, SV=.698, XGB3= .696}.
Conclusion: Many sites have large patient data repositories, enabling the evaluation of treatment efficacy. Learning models can identify strong correlations between features and patient outcomes, and can help predict future outcomes, aiding clinicians with treatment decisions. In our dataset, models using LR, RF, and SV performed best across categories. While some models work better with specific datasets, the encapsulating soft voting classifier consistently delivered high quality results. We aim to enhance our models by incorporating internal data, refining parameters, adding new models, and exploring additional datasets.

Back to List