Author: Hassan Bagher-Ebadian, Ahmed I Ghanem, Joshua P. Kim, Chengyin Li, Rafi Ibn Sultan, Kundan S Thind, Dongxiao Zhu 👨🔬
Affiliation: Wayne State University, Department of Radiation Oncology, Henry Ford Health-Cancer, Detroit, MI and Alexandria Department of Clinical Oncology, Faculty of Medicine, Alexandria University, Henry Ford Health 🌍
Purpose: Accurate segmentation of the Left Anterior Descending (LAD) artery in free-breathing 3D treatment planning CT is crucial for radiotherapy but remains challenging due to its small size, complex/ variant shape, and low contrast. While Convolutional Neural Networks (CNNs) excel at capturing local features and vision transformers excel at global context, neither fully overcomes the above-mentioned challenges due to their inability to balance both effectively. This study introduces NA-Unetr, a novel 3D segmentation architecture leveraging Neighborhood Attention to integrate both local and global context, improving segmentation performance.
Methods: NA-Unetr was evaluated on an IRB-approved dataset of free-breathing 3D CT scans from 20 lung cancer patients with physician-delineated LAD artery ground truths. The dataset was split into 10% hold-out test data and 90% training-validation data with three-fold cross-validation. The encoder-decoder architecture comprises four stages, incorporating 3D Neighborhood Attention Transformers to sequentially process local and global contexts from low-level to high-level understanding. A multi-loss balancing strategy was employed, combining DiceFocal Loss for segmentation and Hausdorff Loss for refined boundary delineation. Loss contributions were balanced using homoscedastic uncertainty weighting. Performance was evaluated using the Dice Coefficient (DSC), Hausdorff Distance (HD), and Average Surface Distance (ASD).
Results: NA-Unetr outperformed state-of-the-art models, achieving a mean DSC of 37.07 ± 13.04%, significantly surpassing the CNN-based model, nn-UNet (26.40 ± 7.68%, p = 0.003) and the transformer-based model, Swin UNETR (28.70 ± 16.30%, p = 0.081). It also demonstrated reduced HD95 and ASD (33.68 ± 11.50 mm, 6.79 ± 2.03 mm) compared to nn-UNet (43.01 ± 9.02 mm, 14.41 ± 2.82 mm) and Swin UNETR (38.99 ± 31.31 mm, 8.15 ± 3.14 mm).
Conclusion: NA-Unetr addresses the challenges of LAD segmentation by effectively balancing local precision with global context, providing a comprehensive view alongside localized improvements. Further validation on larger datasets is needed to confirm its generalizability.