Machine Learning with Dirichlet and Beta Process Priors: Theory and Applications

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Bayesian nonparametric methods are useful for modeling data without having to define the complexity of the entire model a priori, but rather allowing for this complexity to be determined by the data. Two problems considered in this dissertation are the number of components in a mixture model, and the number of factors in a latent factor model, for which the Dirichlet process and the beta process are the two respective Bayesian nonparametric priors selected for handling these issues.

The flexibility of Bayesian nonparametric priors arises from the prior's definition over an infinite dimensional parameter space. Therefore, there are theoretically an infinite number of latent components and an infinite number of latent factors. Nevertheless, draws from each respective prior will produce only a small number of components or factors that appear in a given data set. As mentioned, the number of these components and factors, and their corresponding parameter values, are left for the data to decide.

This dissertation is split between novel practical applications and novel theoretical results for these priors. For the Dirichlet process, we investigate stick-breaking representations for the finite Dirichlet process and their application to novel sampling techniques, as well as a novel mixture modeling framework that incorporates multiple modalities within a data set. For the beta process, we present a new stick-breaking construction for the infinite-dimensional prior, and consider applications to image interpolation problems and dictionary learning for compressive sensing.





Paisley, John William (2010). Machine Learning with Dirichlet and Beta Process Priors: Theory and Applications. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.