Noise Sensitivity of Benchmark Whole-Body CT Segmentation Models: Totalsegmentator and Vista3D Performance on an Independent Dataset πŸ“

Author: Samuel L. Brady, Shruti Hegde, Alexander Knapp, Usman Mahmood, Joseph G. Meier, Elanchezhian Somasundaram, Zachary Taylor πŸ‘¨β€πŸ”¬

Affiliation: Cincinnati Children's Hospital Medical Ctr, Department of Medical Physics, Memorial Sloan Kettering Cancer Center, Cincinnati Children's Hospital Medical Center, Cincinnati Childrens Hospital Med Ctr 🌍

Abstract:

Purpose:
To assess how two benchmark multi-organ CT segmentation models respond to varying image noise levels.
Methods:
This study utilized the pediatric CT dataset from The Cancer Imaging Archive (TCIA), comprising 357 cases, to assess the performance of Vista3Dβ€”a foundational CT segmentation modelβ€”and TotalSegmentator, a widely recognized benchmark segmentation tool. Segmentation accuracy was measured using the Dice Similarity Coefficient (DSC) across 15 anatomical organs: right adrenal gland, left adrenal gland, bladder, duodenum, esophagus, gallbladder, heart, right kidney, left kidney, liver, prostate, small bowel, spleen, and stomach. Image noise was quantified by calculating the minimum standard deviation within a 32Γ—32Γ—4 voxel patch centered on the liver, chosen for its large size and homogeneous Hounsfield Unit (HU) values. To ensure robust statistical analysis, the top 2% of outlier noise measurements were excluded. Pearson’s correlation coefficient (r) and corresponding p-values were computed to examine the relationship between noise levels and segmentation performance.
Results:
The mean DSC for Vista3D and TotalSegmentator across the 15 organs were 0.68 Β± 0.09 and 0.71 Β± 0.08, respectively. Both models demonstrated a statistically significant positive correlation between overall DSC and noise (Vista3D: r = 0.25, p < 0.001; TotalSegmentator: r = 0.27, p < 0.001). However, for certain organs such as the gallbladder, the correlation between DSC and noise was not significant for either model (Vista3D: r = 0.03, p = 0.6; TotalSegmentator: r = 0.05, p = 0.3). In contrast, for the left kidney, TotalSegmentator exhibited a stronger correlation with noise (r = 0.13, p = 0.02) compared to Vista3D (r = 0.07, p = 0.2).
Conclusion:
AI-based CT segmentation models showed overall improved segmentation accuracy with increasing noise for the TCIA pediatric dataset, however, the degree of sensitivity varies across different organs underscoring the need for establishing noise thresholds before deploying them in clinical settings.

Back to List