dc.description.abstract |
<p>Identifying a lower-dimensional latent space for representation of high-dimensional
observations is of significant importance in numerous biomedical and machine learning
applications. In many such applications, it is now routine to collect data where the
dimensionality of the outcomes is comparable or even larger than the number of available
observations. Motivated in particular by the problem of predicting the risk of impending
diseases from massive gene expression and single nucleotide polymorphism profiles,
this dissertation focuses on building parsimonious models and computational schemes
for high-dimensional continuous and unordered categorical data, while also studying
theoretical properties of the proposed methods. Sparse factor modeling is fast becoming
a standard tool for parsimonious modeling of such massive dimensional data and the
content of this thesis is specifically directed towards methodological and theoretical
developments in Bayesian sparse factor models.</p><p>The first three chapters of the
thesis studies sparse factor models for high-dimensional continuous data. A class
of shrinkage priors on factor loadings are introduced with attractive computational
properties, with operating characteristics explored through a number of simulated
and real data examples. In spite of the methodological advances over the past decade,
theoretical justifications in high-dimensional factor models are scarce in the Bayesian
literature. Part of the dissertation focuses on exploring estimation of high-dimensional
covariance matrices using a factor model and studying the rate of posterior contraction
as both the sample size & dimensionality increases. </p><p>To relax the usual assumption
of a linear relationship among the latent and observed variables in a standard factor
model, extensions to a non-linear latent factor model are also considered.</p><p>Although
Gaussian latent factor models are routinely used for modeling of dependence in continuous,
binary and ordered categorical data, it leads to challenging computation and complex
modeling structures for unordered categorical variables. As an alternative, a novel
class of simplex factor models for massive-dimensional and enormously sparse contingency
table data is proposed in the second part of the thesis. An efficient MCMC scheme
is developed for posterior computation and the methods are applied to modeling dependence
in nucleotide sequences and prediction from high-dimensional categorical features.
Building on a connection between the proposed model & sparse tensor decompositions,
we propose new classes of nonparametric Bayesian models for testing associations between
a massive dimensional vector of genetic markers and a phenotypical outcome.</p>
|
|