Browsing by Subject "Gaussian processes"
- Results Per Page
- Sort Options
Item Open Access Bayesian interaction estimation with high-dimensional dependent predictors(2021) Ferrari, FedericoHumans are constantly exposed to mixtures of different chemicals arising from environmental contamination. While certain compounds, such as heavy metals and mercury, are well known to be toxic, there are many complex mixtures whose health effects are still unknown. It is of fundamental public health importance to understand how these exposures interact to impact risk of disease and the health effects of cumulative exposure to multiple agents. The goal of this thesis is to build data-driven models to tackle major challenges in modern health applications, with a special interest in estimating statistical interactions among correlated exposures. In Chapter 1, we develop a flexible Gaussian process regression model (MixSelect) that allows to simultaneously estimate a complex nonparametric model and provide interpretability. A key component of this approach is the incorporation of a heredity constraint to only include interactions in the presence of main effects, effectively reducing dimensionality of the model search. Next, we focus our modelling effort on characterizing the joint variability of chemical exposures using factor models. In fact, chemicals usually co-occur in the environment or in synthetic mixtures; as a result, their exposure levels can be highly correlated. In Chapter 3, we build a Factor analysis for INteractions (FIN) framework that jointly provides dimensionality reduction in the chemical measurements and allows to estimate main effects and interactions. Through appropriate modifications of the factor modeling structure, FIN can accommodate higher order interactions and multivariate outcomes. Further, we extend FIN to survival analysis and exponential families in Chapter 4, as medical studies often include collect high-dimensional data and time-to-event outcomes. We address these cases through a joint factor analysis modeling approach in which latent factors underlying the predictors are included in a quadratic proportional hazards regression model, and we provide expressions for the induced coefficients on the covariates. In Chapter 5, we combine factor models and nonparametric regression. We build a copula factor model for the chemical exposures and use Bayesian B-splines for flexible dose-response modeling. Finally, in Chapter 6 we we propose a post-processing algorithm that allows for identification and interpretation of the factor loadings matrix and can be easily applied to the models described in the previous chapters.
Item Open Access Gaussian Process Kernels for Cross-Spectrum Analysis in Electrophysiological Time Series(2016) Ulrich, Kyle RichardMulti-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.
Item Open Access Gaussian Process-Based Models for Clinical Time Series in Healthcare(2018) Futoma, Joseph DavidClinical prediction models offer the ability to help physicians make better data-driven decisions that can improve patient outcomes. Given the wealth of data available with the widespread adoption of electronic health records, more flexible statistical models are required that can account for the messiness and complexity of this data. In this dissertation we focus on developing models for clinical time series, as most data within healthcare is collected longitudinally and it is important to take this structure into account. Models built off of Gaussian processes are natural in this setting of irregularly sampled, noisy time series with many missing values. In addition, they have the added benefit of accounting for and quantifying uncertainty, which can be extremely useful in medical decision making. In this dissertation, we develop new Gaussian process-based models for medical time series along with associated algorithms for efficient inference on large-scale electronic health records data. We apply these models to several real healthcare applications, using local data obtained from the Duke University healthcare system.
In Chapter 1 we give a brief overview of clinical prediction models, electronic health records, and Gaussian processes. In Chapter 2, we develop several Gaussian process models for clinical time series in the context of chronic kidney disease management. We show how our proposed joint model for longitudinal and time-to-event data and model for multivariate time series can make accurate predictions about a patient's future disease trajectory. In Chapter 3, we combine multi-output Gaussian processes with a downstream black-box deep recurrent neural network model from deep learning. We apply this modeling framework to clinical time series to improve early detection of sepsis among patients in the hospital, and show that the Gaussian process preprocessing layer both allows for uncertainty quantification and acts as a form of data augmentation to reduce overfitting. In Chapter 4, we again use multi-output Gaussian processes as a preprocessing layer in model-free deep reinforcement learning. Here the goal is to learn optimal treatments for sepsis given clinical time series and historical treatment decisions taken by clinicians, and we show that the Gaussian process preprocessing layer and use of a recurrent architecture offers improvements over standard deep reinforcement learning methods. We conclude in Chapter 5 with a summary of future areas for work, and a discussion on practical considerations and challenges involved in deploying machine learning models into actual clinical practice.
Item Open Access Recent Advances on the Design, Analysis and Decision-making with Expensive Virtual Experiments(2024) Ji, YiWith breakthroughs in virtual experimentation, computer simulation has been replacing physical experiments that are prohibitively expensive or infeasible to perform in a large scale. However, as the system becomes more complex and realistic, such simulations can be extremely time-consuming and simulating the entire parameter space becomes impractical. One solution is computer emulation, which builds a predictive model based on a handful of simulation data. Gaussian process is a popular emulator used in many physics and engineering applications for this purpose. In particular, for complicated scientific phenomena like the Quark-Gluon Plasma, employing a multi-fidelity emulator to pool information from multi-fidelity simulation data may enhance predictive performance while simultaneously reducing simulation costs. In this dissertation, we explore two novel approaches for multi-fidelity Gaussian process modeling. The first model is the Graphical Multi-fidelity Gaussian Process (GMGP) model, which embeds scientific dependencies among multi-fidelity data in a directed acyclic graph (DAG). The second model we present is the Conglomerate Multi-fidelity Gaussian Process (CONFIG) model, applicable to scenarios where the accuracy of a simulator is controlled by multiple continuous fidelity parameters.
Software engineering is another domain relying heavily on virtual experimentation. In order to ensure the robustness of a new software application, it is required to go through extensive testing and validation before production. Such testing is typically carried out through virtual experimentation and can require substantial computing resources, particularly as the system complexity grows. Fault localization is a key step in software testing as it pinpoints root causes of failures based on executed test case outcomes. However, existing fault localization techniques are mostly deterministic and provides limited insight into assessing the probabilistic risk of failure-inducing combinations. To address this limitation, we present a novel Bayesian Fault Localization (BayesFLo) framework for software testing, yielding a principled and probabilistic ranking of suspicious inputs for identifying the root causes of software failures.
Item Open Access Using Gaussian Processes for the Calibration and Exploration of Complex Computer Models(2014) Coleman-Smith, ChristopherCutting edge research problems require the use of complicated and computationally expensive computer models. I will present a practical overview of the design and analysis of computer experiments in high energy nuclear and astro phsyics. The aim of these experiments is to infer credible ranges for certain fundamental parameters of the underlying physical processes through the analysis of model output and experimental data.
To be truly useful computer models must be calibrated against experimental data. Gaining an understanding of the response of expensive models across the full range of inputs can be a slow and painful process. Gaussian Process emulators can be an efficient and informative surrogate for expensive computer models and prove to be an ideal mechanism for exploring the response of these models to variations in their inputs.
A sensitivity analysis can be performed on these model emulators to characterize and quantify the relationship between model input parameters and predicted observable properties. The result of this analysis provides the user with information about which parameters are most important and most likely to affect the prediction of a given observable. Sensitivity analysis allow us to identify what model parameters can be most efficiently constrained by the given observational data set.
In this thesis I describe a range of techniques for the calibration and exploration of the complex and expensive computer models so common in modern physics research. These statistical methods are illustrated with examples drawn from the fields of high energy nuclear physics and galaxy formation.