Advances in Bayesian Factor Modeling and Scalable Gaussian Process Regression

Loading...
Thumbnail Image

Date

2020

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

393
views
502
downloads

Abstract

Correlated measurements arise across a diverse array of disciplines such as epidemiology, toxicology, genomics, economics, and meteorology. Factor models describe the association between variables by assuming some latent factors drive structured variation therein. Gaussian process (GP) models, on the other hand, describe the association between variables using a distance-based covariance kernel. This dissertation introduces two novel extensions of Bayesian factor models driven by applied problems, and then proposes an algorithm to allow for scalable approximate Bayesian GP sampling. First, the FActor Regression for Verbal Autopsy (FARVA) model is developed for predicting the cause of death and cause-specific mortality fraction in low-resource settings based on verbal autopsies. Both the mean and the association between symptoms provides information used to differentiate decedents across cause of death groups. This class of hierarchical factor regression models avoids restrictive assumptions of standard methods, allows both the mean and covariance to vary with COD category, and can include covariate information on the decedent, region, or events surrounding death. Next, the Bayesian partially Supervised Sparse and Smooth Factor Analysis (BS3FA) model is developed to enable toxicologists, who are faced with a rising tide of chemicals under regulation and in use, to choose which chemicals to prioritize for screening and to predict the toxicity of as-yet-unscreened chemicals based on their molecular structure. Latent factors driving structured variability are assumed to be shared between the molecular structure observations and dose-response observations from high-throughput screening. These shared latent factors allow the model to learn a distance between chemicals targeted to toxicity, rather than one based on molecular structure alone. Finally, the Fast Increased Fidelity Approximate GP (FIFA-GP) allows for the association between observations to be modeled by a high fidelity Gaussian process approximation even when the number of observations is on the order of 10^5. A sampling algorithm that scales at O(n log^2(n)) time is described, and a proof showing that the approximation's Kullback-Leibler divergence to the true posterior can be made arbitrarily small is provided.

Description

Provenance

Subjects

Citation

Citation

Moran, Kelly R. (2020). Advances in Bayesian Factor Modeling and Scalable Gaussian Process Regression. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/20849.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.