Clustering Multiple Related Datasets with a Hierarchical Dirichlet Process

dc.contributor.advisor

West, Mike

dc.contributor.author

de Oliveira Sales, Ana Paula

dc.date.accessioned

2012-05-29T16:36:52Z

dc.date.available

2012-11-25T05:30:16Z

dc.date.issued

2011

dc.department

Statistical Science

dc.description.abstract

I consider the problem of clustering multiple related groups of data. My approach entails mixture models in the context of hierarchical Dirichlet processes, focusing on their ability to perform inference on the unknown number of components in the mixture, as well as to facilitate the sharing of information and borrowing of strength across the various data groups. Here, I build upon the hierarchical Dirichlet process model proposed by Muller et al. (2004), revising some relevant aspects of the model, as well as improving the MCMC sampler's convergence by combining local Gibbs sampler moves with global Metropolis-Hastings split-merge moves. I demonstrate the strengths of my model by employing it to cluster both synthetic and real datasets.

dc.identifier.uri

https://hdl.handle.net/10161/5617

dc.subject

Statistics

dc.subject

Bayesian statistics

dc.subject

Clustering

dc.subject

Dirichlet process

dc.subject

Hierarchical Dirichlet process

dc.subject

Nonparametric Bayesian models

dc.title

Clustering Multiple Related Datasets with a Hierarchical Dirichlet Process

dc.type

Master's thesis

duke.embargo.months

6

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
deOliveiraSales_duke_0066N_10999.pdf
Size:
20.75 MB
Format:
Adobe Portable Document Format

Collections