Advances in Bayesian Factor Modeling and Scalable Gaussian Process Regression
Date
2020
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Correlated measurements arise across a diverse array of disciplines such as epidemiology, toxicology, genomics, economics, and meteorology. Factor models describe the association between variables by assuming some latent factors drive structured variation therein. Gaussian process (GP) models, on the other hand, describe the association between variables using a distance-based covariance kernel. This dissertation introduces two novel extensions of Bayesian factor models driven by applied problems, and then proposes an algorithm to allow for scalable approximate Bayesian GP sampling. First, the FActor Regression for Verbal Autopsy (FARVA) model is developed for predicting the cause of death and cause-specific mortality fraction in low-resource settings based on verbal autopsies. Both the mean and the association between symptoms provides information used to differentiate decedents across cause of death groups. This class of hierarchical factor regression models avoids restrictive assumptions of standard methods, allows both the mean and covariance to vary with COD category, and can include covariate information on the decedent, region, or events surrounding death. Next, the Bayesian partially Supervised Sparse and Smooth Factor Analysis (BS3FA) model is developed to enable toxicologists, who are faced with a rising tide of chemicals under regulation and in use, to choose which chemicals to prioritize for screening and to predict the toxicity of as-yet-unscreened chemicals based on their molecular structure. Latent factors driving structured variability are assumed to be shared between the molecular structure observations and dose-response observations from high-throughput screening. These shared latent factors allow the model to learn a distance between chemicals targeted to toxicity, rather than one based on molecular structure alone. Finally, the Fast Increased Fidelity Approximate GP (FIFA-GP) allows for the association between observations to be modeled by a high fidelity Gaussian process approximation even when the number of observations is on the order of 10^5. A sampling algorithm that scales at O(n log^2(n)) time is described, and a proof showing that the approximation's Kullback-Leibler divergence to the true posterior can be made arbitrarily small is provided.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Moran, Kelly R. (2020). Advances in Bayesian Factor Modeling and Scalable Gaussian Process Regression. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/20849.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.