Feasibility of Using a Convolutional Neural Network to Predict Physician Evaluation of Synthetic Medical Images 📝

Author: Sofia Beer, Menal Bhandari, Alec Block, Nader Darwish, Joseph Dingillo, Sebastien A. Gros, Hyejoo Kang, Andrew Keeler, Rajkumar Kettimuthu, Jason Patrick Luce, Ha Nguyen, John C. Roeske, George K. Thiruvathukal, Austin Yunker 👨‍🔬

Affiliation: Data Science and Learning Division, Argonne National Laboratory, Department of Radiation Oncology, Stritch School of Medicine, Loyola University Chicago, Stritch School of Medicine Loyola University Chicago, Cardinal Bernardin Cancer Center, Loyola University Chicago, Department of Computer Science, Loyola University of Chicago 🌍

Abstract:

Purpose: Artificial intelligence (AI) generated synthetic medical images are seeing increased use in radiology and radiation oncology. Physician observer studies are an ideal way to evaluate the usability of synthetic images but are labor intensive. Alternatively, a model observer could be used to predict how a physician would evaluate a synthetic image. Therefore, we performed a physician observer study of real and synthetic medical images and examined the feasibility of using a convolutional neural network (CNN) as a model observer to predict physician Likert scores consistent with the observer study.
Materials/Methods: This study used twenty-three head-and-neck CBCT patient scans. For each patient, two image volumes were reconstructed: a clinical-dose volume using full projection data, and a simulated low-dose volume sampling one-eighth of the projection data. A U-net neural network was trained to transform the low-dose images into synthetic clinical-dose images. Three radiation oncologists were asked to assign a Likert score, from 1 to 5, to the synthetic clinical-dose and clinical-dose images, evaluating their ability to delineate soft tissue features. These Likert scores were used as labels to perform transfer learning using the pretrained AlexNet CNN image classifier, with 6919 (80%) images used for training and 1730 (20%) used for testing.
Results: The mean Likert score assigned by physicians to all image volumes was 3.2, with a standard deviation of 1.2. The distribution of ground truth labels was 25.8%, 21.5%, 25.8%, 16.1%, and 10.8% across labels 1 through 5, respectively. The CNN was able accurately predict physician Likert scores with an accuracy of 95.3%.
Conclusion: This proof-of-principle study indicates that a CNN can be used to predict physician Likert score ratings of synthetic medical images with a success rate of ~95%.This provides a potential model observer framework for evaluating synthetic medical image quality with minimal physician input beyond initial network training.

Back to List