Binary Classification of Lymphedema in 3DCRT Patients Using Machine Learning on 3D Dose Distribution Data 📝

Author: Jee Suk Chang, Hojin Kim, Jin Sung Kim, Jaehyun Seok 👨‍🔬

Affiliation: Department of Radiation Oncology, Yonsei Cancer Center, Heavy Ion Therapy Research Institute, Yonsei University College of Medicine, Department of Integrative Medicine 🌍

Abstract:

Purpose: This study aims to leverage 3D dose distribution data to develop a machine learning model capable of accurately predicting lymphedema occurrence in patients undergoing 3D conformal radiation therapy (3D CRT).

Methods: This study utilized a retrospective dataset of 3D CRT-treated patients from Severance Hospital, collected between 2012 and 2019, to classify lymphedema occurrence. The dataset consisted of 119 patients (96 normal and 23 lymphedema cases) with 3D dose distribution data, split into 95 and 24 for training and testing phases. The 3D dose distributions were first preprocessed by converting the volumetric data into one-dimensional feature vectors to facilitate analysis. Feature selection was performed in two stages. First, variance threshold removed low-variance features, extracting 5% of the entire features based on variance across the datasets. Then, least absolute shrinkage and selection operator (LASSO) regression was trained over 1000 iterations, retaining features selected in more than 850 iterations, resulting in 207 features. The selected features were then used as input to a support vector machine (SVM) classifier to perform binary classification of lymphedema occurrence, whose predictive performance was assessed by the area under the curve (AUC).

Results: The SVM model achieved an accuracy of 83%, a weighted F1-score of 0.82 and an AUC of 0.747 after feature selection, with a precision of 0.86 and 0.67 for non-lymphedema class and a recall of 0.95 and 0.40 for lymphedema class. The performance discrepancy between normal and lymphedema classes may arise from the class imbalance in the dataset, which could be addressed by employing a synthetic oversampling and developing a new network architecture.

Conclusion: This study demonstrated the potential of using 3D dose distribution data and advanced feature selection for predicting lymphedema in 3D CRT patients, with the SVM model achieving an AUC of 0.747.

Back to List