BEST IN PHYSICS MULTI-DISCIPLINARY: Foundation Model-Empowered Unsupervised 3D Deformable Medical Image Registration 📝

Author: Xianjin Dai, PhD, Zhuoran Jiang, Lei Ren, Lei Xing, Zhendong Zhang 👨‍🔬

Affiliation: University of Maryland School of Medicine, Department of Radiation Oncology, Stanford University, Duke University, Stanford University 🌍

Abstract:

Purpose: Unsupervised deep learning has shown great promise in deformable image registration (DIR). These methods update model weights to optimize image similarity without necessitating ground truth deformation vector fields (DVFs). However, they inherently face the ill-conditioning challenges posed by structural ambiguities. This study aims to address these issues by integrating the implicit anatomical understanding of vision foundation models into a multi-scale unsupervised framework for accurate and robust DIR.

Methods: Our method takes moving and fixed images as inputs and leverages a pre-trained encoder from a vision foundation model to extract implicit features. These features are merged with those extracted by convolutional adaptors to incorporate the inductive bias. Correlation-aware multi-layer perceptrons decode the features into DVFs. Additionally, a pyramid architecture is implemented to capture multi-range dependencies, further enhancing the robustness and accuracy of DIR. A multi-modality cross-institutional database (150 cardiac cine MR and 40 liver CT pairs) was used to evaluate the performance of our method both qualitatively and quantitatively using Dice similarity coefficient and anatomical landmark errors.

Results: Our model generated realistic and accurate DVFs. Qualitatively, moving images deformed by our method showed excellent similarities with the fixed images. Quantitatively, our method achieved a registration Dice score of 0.869 ± 0.093 for cardiac MRI, substantially surpassing the state-of-the-art (SOTA) score of 0.815 ± 0.124. In addition, we attained an average landmark error of 1.60±1.44 mm compared to 2.65 ± 2.19 mm for liver CT, demonstrating a significant improvement. Ablation tests further verified the effectiveness of integrating foundation features to improve DIR accuracy (p<0.05).

Conclusion: The proposed novel method demonstrates significant advancements for DIR in multi-modality images with complex structures and low contrasts, making it a powerful tool for a wide range of applications in medical image analysis.

Back to List