Inference of Low-Dimensional Latent Structure in High-Dimensional Data
The problem of learning a latent model for sparse or low-dimensional representation of high-dimensional data has attracted significant attention for many years. This thesis focuses on learning latent models for sparse or low-dimensional representation of images, dynamic data, and documents with Bayesian nonparametrics. The thesis consists of three parts.
First, nonparametric Bayesian methods are considered for recovery of imagery based upon compressive measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the test data, and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Spatial inter-relationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other state-of-the-art methods in the literature.
Second, hierarchical Bayesian methods are employed to learn a reversible statistical embedding. The proposed embedding procedure is connected to spectral embedding methods, for example, diffusion maps and Isomap, yielding a new statistical spectral framework. The proposed approach allows one to discard the training data when embedding new data, allows synthesis of high-dimensional data from the embedding space, and provides accurate estimation of the latent-space dimensionality. Hierarchical Bayesian methods are also developed to learn a nonlinear dynamic model in the low-dimensional embedding space, allowing joint analysis of multiple types of dynamic data, sharing strength and inferring inter-relationships. In addition to analyzing dynamic data, the learned model also yields effective synthesis. Example results are presented for statistical embedding, latent-space dimensionality estimation, and analysis and synthesis of high-dimensional (dynamic) motion-capture data.
Third, a new hierarchical tree-based topic model is developed, based on nonparametric Bayesian techniques. The model has two unique attributes: (i) a child node in the tree may have more than one parent, with the goal of eliminating redundant sub-topics deep in the tree; and (ii) parsimonious sub-topics are manifested, by removing redundant usage of words at multiple scales. The depth and width of the tree are unbounded within the prior, with a retrospective sampler employed to adaptively infer the appropriate tree size based upon the corpus under study. Excellent quantitative results are manifested on five standard data sets, and the inferred tree structure is also found to be highly interpretable.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations