Bayesian interaction estimation with high-dimensional dependent predictors
Humans are constantly exposed to mixtures of different chemicals arising from environmental contamination. While certain compounds, such as heavy metals and mercury, are well known to be toxic, there are many complex mixtures whose health effects are still unknown. It is of fundamental public health importance to understand how these exposures interact to impact risk of disease and the health effects of cumulative exposure to multiple agents. The goal of this thesis is to build data-driven models to tackle major challenges in modern health applications, with a special interest in estimating statistical interactions among correlated exposures. In Chapter 1, we develop a flexible Gaussian process regression model (MixSelect) that allows to simultaneously estimate a complex nonparametric model and provide interpretability. A key component of this approach is the incorporation of a heredity constraint to only include interactions in the presence of main effects, effectively reducing dimensionality of the model search. Next, we focus our modelling effort on characterizing the joint variability of chemical exposures using factor models. In fact, chemicals usually co-occur in the environment or in synthetic mixtures; as a result, their exposure levels can be highly correlated. In Chapter 3, we build a Factor analysis for INteractions (FIN) framework that jointly provides dimensionality reduction in the chemical measurements and allows to estimate main effects and interactions. Through appropriate modifications of the factor modeling structure, FIN can accommodate higher order interactions and multivariate outcomes. Further, we extend FIN to survival analysis and exponential families in Chapter 4, as medical studies often include collect high-dimensional data and time-to-event outcomes. We address these cases through a joint factor analysis modeling approach in which latent factors underlying the predictors are included in a quadratic proportional hazards regression model, and we provide expressions for the induced coefficients on the covariates. In Chapter 5, we combine factor models and nonparametric regression. We build a copula factor model for the chemical exposures and use Bayesian B-splines for flexible dose-response modeling. Finally, in Chapter 6 we we propose a post-processing algorithm that allows for identification and interpretation of the factor loadings matrix and can be easily applied to the models described in the previous chapters.

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info