On Bayesian Analyses of Functional Regression, Correlated Functional Data and Non-homogeneous Computer Models
Current frontiers in complex stochastic modeling of high-dimensional processes include major emphases on so-called functional data: problems in which the data are snapshots of curves and surfaces representing fundamentally important scientific quantities. This thesis explores new Bayesian methodologies for functional data analysis.
The first part of the thesis places emphasis on the role of factor models in functional data analysis. Data reduction becomes mandatory when dealing with such high-dimensional data, more so when data are available on a large number of individuals. In Chapter 2 we present a novel Bayesian framework which employs a latent factor construction to represent each variable by a low dimensional summary. Further, we explore the important issue of modeling and analyzing the relationship of functional data with other covariate and outcome variables simultaneously measured on the same subjects.
The second part of the thesis is concerned with the analysis of circadian data. The focus is on the identification of circadian genes that is, genes whose expression levels appear to be rhythmic through time with a period of approximately 24 hours. While addressing this goal, most of the current literature does not account for the potential dependence across genes. In Chapter 4, we propose a Bayesian approach which employs latent factors to accommodate dependence and verify patterns and relationships between genes, while representing the true gene expression trajectories in the Fourier domain allows for inference on period, phase, and amplitude of the signal.
The third part of the thesis is concerned with the statistical analysis of computer models (simulators). The heavy computational demand of these input-output maps calls for statistical techniques that quickly estimate the surface output at untried inputs given a few preliminary runs of the simulator at a set design points. In this regard, we propose a Bayesian methodology based on a non-stationary Gaussian process. Relying on a model-based assessment of uncertainty, we envision a sequential design technique which helps choosing input points where the simulator should be run to minimize the uncertainty in posterior surface estimation in an optimal way. The proposed non-stationary approach adapts well to output surfaces of unconstrained shape.
Latent factor models
Non-stationary Gaussian process
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations