Latent Diffusion for 3D CT Reconstruction from Biplanar X-Rays 📝

Author: Guha Balakrishnan, Osama R. Mawlawi, Yiran Sun, Ashok Veeraraghavan 👨‍🔬

Affiliation: RICE University, UT MD Anderson Cancer Center 🌍

Abstract:

Purpose:
Previous deep learning (DL) techniques such as X2CT-GAN [1] has shown great promise in reconstructing realistic CT volume from biplanar X-rays, however they introduce numerous artifacts in the reconstructed images. This work aims to improve on X2CT-GAN by leveraging latent diffusion models [2] to generate 3D CT volumes from biplanar X-rays.
Methods:
First, we learn a compact 3D latent space for CT volumes, followed by training a conditional diffusion model within this space. Second, we design the conditioning signal by integrating information from biplanar 2D X-rays into a unified representation. Specifically, inspired by pixelNeRF[3], we extracted 2D features from each X-ray image and then used ray tracing based on the imaging geometry to combine these features into a coherent 3D volume. We evaluate our method on public Lung Image Database Consortium CT (LIDC) dataset [4]. The LIDC includes 1018 patients, which we randomly split into 868/50/100 train/validation/test groups. We resampled each scan to 1 mm3 resolution and generated corresponding biplanar X-rays using TIGRE DRR generator [5]. We normalized all paired datasets to range [0, 1] before training. All experiments were implemented in PyTorch and executed on NVIDIA A100 GPUs. The model was trained with a batch size of 1 using the Adam optimizer and a fixed learning rate of 1x10-4. The generated CT scans were compared to the ground truth CT scans as well as those from X2CT-GAN using PSNR (dB)/SSIM.
Results:
Performance of our algorithm compared to X2CT-GAN is shown in Fig. 1. The average PSNR (dB)/SSIM from our method was 28.08/0.699, while from X2CT-GAN was 26.59/0.639.
Conclusion:
Our results demonstrate our method enables the recovery of high-quality CT images that preserve geometric structure and sharp edges by utilizing a latent diffusion model conditioned on fused features from biplanar X-ray images.

Back to List