Towards AI Trustworthiness in Healthcare: Prediction, Interpretability, and Uncertainty Quantification
Date
2025
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Attention Stats
Abstract
The increasing integration of Artificial Intelligence (AI) into healthcare promises to revolutionize diagnostics and personalized medicine. These AI systems may be able to non-invasively characterize a patient's underlying phenotype, leading to new insights for diagnosis, prognosis, and treatment planning. Providing meaningful and reliable predictions is therefore fundamental to contemporary AI for healthcare.
However, the most powerful AI models often operate as ``black boxes,'' making their internal decision-making processes opaque to clinicians. This lack of transparency and interpretability creates a critical ``trust gap,'' which is a major challenge for the adoption of AI in high-stakes clinical environments. There is an urgent need to investigate novel technologies to manage this challenge, so that AI can be effectively, safely, and reliably used to solve complex clinical problems.
Major contributions of this dissertation research include: (1) the development of a novel data fusion framework based on physics-inspired potential functions to optimally integrate high-dimensional imaging and low-dimensional non-imaging data; (2) the creation of a model-agnostic uncertainty quantification framework, the Concordance-based Predictive Uncertainty (CPU)-Index, to assess the reliability of individual time-to-event predictions; and (3) the design of a generative distributional regression framework using a Conditional Variational Autoencoder (cVAE) to produce personalized reference ranges instead of simple point estimates.
Multi-modal data fusion, i.e., the integration of disparate data sources such as high-dimensional images and low-dimensional clinical records, is a common task in medical AI. Notable limitations of current approaches include: (1) dimensional disparity, where high-dimensional features overpower low-dimensional ones, and (2) heuristic approaches that lack a formal measure of fusion quality. To overcome these challenges, our data fusion framework models the neural network’s feature space as a physical system obeying a Gibbs measure. We use positional encoding to transcribe low-dimensional clinical data into a higher-dimensional space, and we introduce a quantitative metric, $\gamma$, that represents the ratio of effective data dimensions. Optimal $\gamma$ is identified through data-driven approaches, supporting clinical decision-making.
Uncertainty quantification for time-to-event models is another critical challenge, as clinicians need to understand the confidence associated with an individual patient’s prognosis. Our proposed CPU-Index addresses this by bridging two parallel prognostic paradigms: interpretable subgroup analysis (e.g., clinical lexicons) and powerful personalized AI models. The framework operates as follows. For a new patient, we generate two independent similarity rankings of patients in the training set: one based on the clinical lexicon and another based on the AI predictions. The CPU-Index is defined as the discordance between these two rankings. A high index value indicates that the clinical rules and the data-driven model disagree, signaling high uncertainty in the AI prediction for that specific patient.
In addition, standard regression models that produce a single point estimate for a continuous clinical variable are often insufficient for clinical decision-making, which requires context and an understanding of plausible ranges. To address this, our cVAE framework learns the entire conditional distribution of an outcome, $p(y|X)$. This generative approach allows for both point prediction and the derivation of full, personalized prediction intervals. To ensure the generated distributions are well-calibrated, we introduce a novel training strategy that uses auxiliary quantile heads as a temporary scaffold. These heads provide additional supervision during training to improve the structure of the learned distribution, leading to more stable and clinically meaningful reference ranges at inference time.
This dissertation is structured in pairs of chapters, where each theoretical framework (Chapters 2, 4, and 6) is immediately followed by its validation in clinical applications (Chapters 3, 5, and 7), respectively. The theory of our optimal data fusion framework is covered in Chapter 2. In Chapter 3, this framework is applied to the problems of non-invasively predicting portal venous hypertension and predicting outcomes after stereotactic radiosurgery, where its ability to improve diagnostic and prognostic accuracy is demonstrated. The theoretical development of the CPU-Index is presented in Chapter 4. Chapter 5 then validates this framework through two applications: first, as a corrective tool to improve the specificity of lung cancer screening, and second, as an evaluative tool to compare the inherent uncertainty of different prognostic models for brain metastasis. The distributional regression framework using a modified cVAE is detailed in Chapter 6. Finally, Chapter 7 applies this method to a large-scale, real-world dataset of 145,165 scans to generate personalized liver volume estimates, showcasing its ability to produce more stable reference intervals than traditional methods.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Wang, Yuqi (2025). Towards AI Trustworthiness in Healthcare: Prediction, Interpretability, and Uncertainty Quantification. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/34064.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.
