dc.description.abstract |
<p>In this dissertation, we present four novel contributions to the field of statistics
with the shared goal of personalizing medicine to individual patients. These methods
are developed to directly address problems in health care through two subfields of
statistics: probabilistic machine learning and causal inference. These projects include
improving predictions of adverse events after surgeries, or learning the effectiveness
of treatments for specific subgroups and for individuals. We begin the dissertation
in Chapter 1 with a discussion of personalized medicine, the use of electronic health
record (EHR) data, and a brief discussion on learning heterogeneous treatment effects.
In chapter 2, we present a novel algorithm, Predictive Hierarchical Clustering (PHC),
for agglomerative hierarchical clustering of current procedural terminology (CPT)
codes. Our predictive hierarchical clustering aims to cluster subgroups, not individual
observations, found within our data, such that the clusters discovered result in optimal
performance of a classification model, specifically for predicting surgical complications.
In chapter 3, we develop a hierarchical infinite latent factor model (HIFM) to appropriately
account for the covariance structure across subpopulations in data. We propose a novel
Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly
captures the underlying structure of our data across subpopulations while sharing
information to improve inference and prediction. We apply this work to the problem
of predicting surgical complications using electronic health record data for geriatric
patients at Duke University Health System (DUHS). The last chapters of the dissertation
address personalized medicine from a causal perspective, where the goal is to understand
how interventions affect individuals not full populations. In chapter 4, we address
heterogeneous treatment effects across subgroups, where guidance for observational
comparisons within subgroups is lacking as is a connection to classic design principles
for propensity score (PS) analyses. We address these shortcomings by proposing a novel
propensity score method for subgroup analysis (SGA) that seeks to balance existing
strategies in an automatic and efficient way. With the use of overlap weights, we
prove that an over-specified propensity model including interactions between subgroups
and all covariates results in exact covariate balance within subgroups. This is paired
with variable selection approaches to adjust for a possibly overspecified propensity
score model. Finally, chapter 5 discusses our final contribution, a longitudinal matching
algorithm aiming to predict individual treatment effects of a medication change for
diabetes patients. This project aims to develop a novel and generalizable causal inference
framework for learning heterogeneous treatment effects from Electronic Health Records
(EHR) data. The key methodological innovation is to cast the sparse and irregularly-spaced
EHR time series into functional data analysis in the design stage to adjust for confounding
that changes over time. We conclude the dissertation and discuss future work in Section
6, outlining many directions for continued research on these topics.</p>
|
|