Performance Analysis of Various Deep Learning Networks for Classification of True and False Positive 18F-PSMA Findings

Author: Vasiliki Chatzipavlidou, Ilias Gatos, George C. Kagadis, Theodoros Kalathas, Paraskevi Katsakiori, Anna Makridou, Dimitris N. Mihailidis, Nikos Papathanasiou, Ioanna Stamouli, Stavros Tsantis 👨‍🔬

Affiliation: Theageneio Hospital, University of Pennsylvania, University of Patras 🌍

Abstract:

Purpose: To compare the performance of multiple deep learning (DL) networks, including DenseNet201, InceptionV3, MobileNetV3, EfficientNetB2, NASNetMobile, VGG19, ResNet50, and Xception, in classifying true positive (TP) and false positive (FP) PSMA findings in 18F-PSMA PET/CT images.
Methods: A clinical dataset of 71 male patients who underwent 18F-PSMA PET/CT imaging was analyzed. Malignant regions were labeled as TP, and benign regions as FP, based on the evaluation of a nuclear medicine physician in patients with prostate cancer (PCa) biochemical recurrence. Maximum Intensity Projections (MIPs) were created on the PET coronal plane series for each patient, and manual cropping of regions of interest (ROIs) was performed, resulting in a total of 164 ROIs (76 FP and 81 TP). The images were dynamically augmented during training using rotations, shifts, flips, shear and zoom via ImageDataGenerator, resulting to a dataset with more than 12,000 images. Each model was trained using five-fold cross-validation, and performance metrics such as accuracy, sensitivity, specificity, and Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) were calculated for comparison.
Results: The DL networks achieved a mean accuracy ranging from 79.7% to 84.7%, a mean ROC AUC between 88.4% and 91.4%, mean specificity from 76.8% to 87.4%, and a sensitivity ranging from 78.8% to 88.3%. MobileNetV2 achieved the highest mean accuracy (84.7±2.6%), mean ROC AUC (91.4±3.4%), and specificity (87.4±7.3%), with competitive sensitivity (83.0±8.1%). In contrast, NASNetMobile had the lowest sensitivity (78.8±11.8%) and accuracy (79.7±6.3%). While InceptionV3 achieved the highest sensitivity (88.3±7.7%), it had the lowest specificity (76.8±8.9%) and ROC AUC (88.4±5.9%).
Conclusion: MobileNetV2 outperformed the other models while NASNetMobile exhibited the lowest performance across key metrics. These results highlight the importance of selecting the most suitable model to ensure accurate classification of PSMA findings in PET/CT images.

Performance Analysis of Various Deep Learning Networks for Classification of True and False Positive 18F-PSMA Findings 📝

Abstract: