Predicting Hormone Receptor Status in Breast Cancer Using Mammographic and Sonographic Data and Machine Learning Models 📝

Author: Zahra Bagherpour, Manijeh Beigi, Pedram Fadavi, Faraz Kalantari, Moghadaseh Khaleghibizaki, Hengameh Nazari, Mojtaba Safari, Sepideh Soltani 👨‍🔬

Affiliation: Department of Radiation Oncology, School of Medicine, Iran University of Medical Sciences, Department of Radiation Oncology, School of Medicine, Emory University and Winship Cancer Institute, Department of Radiation Oncology, Iran University of Medical Sciences, University of Arkansas for medical sciences, Department of Radiation physics, The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences 🌍

Abstract:

Purpose: This study aims to evaluate whether readily available mammographic and sonographic data, combined with machine learning (ML) models, can predict critical molecular factors (ER, PR, HER2) in breast cancer patients. The goal is to create a non-invasive and accessible diagnostic tool that can enhance clinical workflows and decision-making.
Methods: We collected mammographic and sonographic data from 149 breast cancer patients with confirmed diagnoses. Key features such as tumor size, shape, margin, vascularity, and calcification were extracted from clinical imaging reports, alongside demographic data. Using these features, we developed and validated machine learning models, including logistic regression (LR) and random forest (RF), to predict hormone receptor statuses. Performance was assessed using the area under the receiver operating characteristic curve (AUC).
Results: For ER and PR, RF outperformed LR in predicting receptor status. Feature importance analysis revealed that calcification (mammography) and tumor shape (sonography) were the most influential features for HER2 prediction, while vascularity and tumor shape were crucial for predicting ER and PR statuses. RF achieved AUC of 0.74 in classifying PR negative and positive, AUC of 0.78 in classifying the ER negative and positive, and AUC of 0.85 in classifying HER2 negative and positive. LR achieved AUC of 0.69, 0.77, and 0.87 in classifying PR, ER, and HER2 positive and negative respectively.
Conclusion: This study demonstrates that machine learning models, using available mammographic and sonographic data, can effectively predict molecular factors and hormone receptor status in breast cancer patients. By bypassing the need for extensive mathematical computations to exploit hidden imaging features thought different approaches such as deep learning or radiomics pipelines, this approach aims to create an efficient and accessible model for clinical use.

Back to List