Clustering Multiple Related Datasets with a Hierarchical Dirichlet Process
I consider the problem of clustering multiple related groups of data. My approach entails mixture models in the context of hierarchical Dirichlet processes, focusing on their ability to perform inference on the unknown number of components in the mixture, as well as to facilitate the sharing of information and borrowing of strength across the various data groups. Here, I build upon the hierarchical Dirichlet process model proposed by Muller <italics>et al.</italics> (2004), revising some relevant aspects of the model, as well as improving the MCMC sampler's convergence by combining local Gibbs sampler moves with global Metropolis-Hastings split-merge moves. I demonstrate the strengths of my model by employing it to cluster both synthetic and real datasets.
Bayesian statistics
Clustering
Dirichlet process
Hierarchical Dirichlet process
Nonparametric Bayesian models

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Masters Theses
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info