Show simple item record

dc.contributor.advisor Dunson, David B en_US
dc.contributor.author Bhattacharya, Anirban en_US
dc.date.accessioned 2012-05-25T20:22:01Z
dc.date.available 2012-05-25T20:22:01Z
dc.date.issued 2012 en_US
dc.identifier.uri http://hdl.handle.net/10161/5606
dc.description Dissertation en_US
dc.description.abstract <p>Identifying a lower-dimensional latent space for representation of high-dimensional observations is of significant importance in numerous biomedical and machine learning applications. In many such applications, it is now routine to collect data where the dimensionality of the outcomes is comparable or even larger than the number of available observations. Motivated in particular by the problem of predicting the risk of impending diseases from massive gene expression and single nucleotide polymorphism profiles, this dissertation focuses on building parsimonious models and computational schemes for high-dimensional continuous and unordered categorical data, while also studying theoretical properties of the proposed methods. Sparse factor modeling is fast becoming a standard tool for parsimonious modeling of such massive dimensional data and the content of this thesis is specifically directed towards methodological and theoretical developments in Bayesian sparse factor models.</p><p>The first three chapters of the thesis studies sparse factor models for high-dimensional continuous data. A class of shrinkage priors on factor loadings are introduced with attractive computational properties, with operating characteristics explored through a number of simulated and real data examples. In spite of the methodological advances over the past decade, theoretical justifications in high-dimensional factor models are scarce in the Bayesian literature. Part of the dissertation focuses on exploring estimation of high-dimensional covariance matrices using a factor model and studying the rate of posterior contraction as both the sample size & dimensionality increases. </p><p>To relax the usual assumption of a linear relationship among the latent and observed variables in a standard factor model, extensions to a non-linear latent factor model are also considered.</p><p>Although Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary and ordered categorical data, it leads to challenging computation and complex modeling structures for unordered categorical variables. As an alternative, a novel class of simplex factor models for massive-dimensional and enormously sparse contingency table data is proposed in the second part of the thesis. An efficient MCMC scheme is developed for posterior computation and the methods are applied to modeling dependence in nucleotide sequences and prediction from high-dimensional categorical features. Building on a connection between the proposed model & sparse tensor decompositions, we propose new classes of nonparametric Bayesian models for testing associations between a massive dimensional vector of genetic markers and a phenotypical outcome.</p> en_US
dc.subject Statistics en_US
dc.subject Bayesian en_US
dc.subject Contingency table en_US
dc.subject Convergence rate en_US
dc.subject Factor model en_US
dc.subject High-dimensional en_US
dc.subject Tensor factorization en_US
dc.title Bayesian Semi-parametric Factor Models en_US
dc.type Dissertation en_US
dc.department Statistical Science en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record