Author: Shreyas Anil, Jason Chan, Arushi Gulati, Yannet Interian, Hui Lin, Benedict Neo, Andrea Park, Bhumika Srinivas 👨🔬
Affiliation: Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, Department of Data Science, University of San Francisco, Department of Radiation Oncology, University of California San Francisco 🌍
Purpose: Hospital readmission prediction models often rely on structured Electronic Health Record (EHR) data, overlooking critical insights from unstructured clinical notes. This study presents a multimodal attention fusion model that integrates both data types to improve readmission prediction for H&N cancer patients.
Methods: We compiled EHR data of H&N cancer patients from diagnosis to discharge for 388 training samples and 97 test samples. Using GPT-4o, we summarized the unstructured physician notes to extract key information. We also generated temporal summaries to capture patient trajectories, including discharge details, length of stay, ICU duration, critical events, surgical interventions, infection records, follow-up gaps and timestamped vital statistics. To process these multimodal inputs, we used two encoder models: Bio+ClinicalBERT for notes and temporal summaries, and Flan-T5 for vital statistics. We then fused the resulting embeddings with a multi-head attention mechanism featuring eight attention heads, enabling cross-modality integration. Finally, we classified hospital readmission outcomes using a fully connected network. We evaluated performance across (1) single-source models, where we tested individual components (notes, temporal summaries, or vital statistics) separately, and (2) a full fusion model, which integrated all modalities. To assess effectiveness, we measured F1 score, AUROC, accuracy, precision, and recall, identifying the optimal predictive setup.
Results: Our multimodal attention fusion model outperformed all other configurations, achieving the highest predictive accuracy and surpassing the zero-shot performance of the LLaMA-3.0 model with a 9% accuracy boost. Compared to the average performance of single-modal approaches, multi-modal data fusion with our attention model improved accuracy by 14%, F1 score by 17%, and AUROC by 18%, highlighting the advantages of integrating diverse data modalities.
Conclusion: Integrating structured EHR data with unstructured notes improves hospital readmission prediction in H&N cancer cohort. Our multimodal approach outperforms zero-shot prompting and single-source models, paving the way to better support clinical decision-making and early intervention.