dc.description.abstract |
<p>Mixture modeling of continuous data is an extremely effective and popular method
for density estimation and clustering. However as the size of the data grows, both
in terms of dimension and number of observations, many modeling and computational
problems arise. In the Bayesian setting, computational methods for posterior inference
become intractable as the number of observations and/or possible clusters gets large.
Furthermore, relabeling in sampling methods is increasingly difficult to address as
the data gets large. This thesis addresses computational and methodolog- ical solutions
to these problems by utilizing modern computational hardware and new methodology.
Novel approaches for parsimonious covariance modeling and information sharing across
multiple data sets are then built upon these computational improvements.</p><p>Chapter
1 introduces the fundamental modeling approaches in mixture modeling including Dirichlet
processes and posterior inference using Gibbs sampling. Chapter 2 describes the utilization
of graphical processing units for massive gains in computational performance in both
mixture models and general Bayesian modeling. Chapter 3 introduces a new relabeling
approach in mixture modeling that can be scaled far beyond current methodology to
massive data and high dimensional settings. Chapter 4 generalizes chapters 2 and 3
to the hierarchical Dirichlet process setting to "borrow strength" from multiple studies
in classification problems in flow cytometry. Chapter 5 develops a novel approach
for sparse covariance estimation using sparse, full rank, orthogonal matrix estimation.
These new methods are applied to a mixture modeling with measurement error setting
for classification. Finally, Chapter 6 summarizes the work given in this thesis and
outlines exciting areas for future research.</p>
|
|