Outcome Observability in Electronic Health Record-Based Clinical Prediction Models

Limited Access
This item is unavailable until:
2025-11-19

Date

2025

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

0
views
1
downloads

Abstract

Electronic health records (EHRs) are a rich, real-world data source widely used to develop clinical prediction models (CPMs). However, outcomes in EHR data are often not fully observed, as individuals can receive care at multiple institutions or otherwise fall outside the system’s observational reach. This observability problem, particularly when it differs by demographic subgroups (differential observability), can introduce a critical source of bias in EHR-based CPMs. As a result, models may systematically underestimate risk for vulnerable groups, reinforcing existing health inequities.While observability is a challenge across all EHR-based studies, this dissertation focuses on outcome observability within CPM development. Specifically, it addresses three key questions: how differential observability can bias CPMs, how to estimate the degree of observability, and how to build robust CPMs despite incomplete outcome information. First, after formally defining observability, we demonstrate how differential observability induces algorithmic bias in CPMs. Next, we propose a novel method to estimate and assess the extent of observability using a fully observed external data. By reweighting the external data to resemble the target EHR population, the approach provides estimates of both overall and differential observability, without requiring direct patient-level linkage. Finally, we address the challenge of constructing CPMs for long-term outcomes with a limited observing window. We frame this as a positive–unlabeled (PU) problem, and employ an adversarial domain adaptation method that aligns historical, fully labeled data with more contemporary, partially labeled target cohort to maintain predictive performance despite shifting patient populations. This dissertation underscores the importance of accounting for observability when leveraging EHR data for CPMs. By defining and measuring outcome observability, as well as adapting modeling strategies to mitigate its impact, this dissertation provides a comprehensive framework for handling observability in EHR data when building clinical prediction models.

Description

Provenance

Subjects

Biostatistics, Bioinformatics, Algorithmic Bias, Clinical Prediction Models, Domain Adaptation, Electronic Health Records

Citation

Citation

Yan, Mengying (2025). Outcome Observability in Electronic Health Record-Based Clinical Prediction Models. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32669.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.