Bayesian Nonparametric Methods for Epidemiology and Clustering

Limited Access
This item is unavailable until:



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Bayesian nonparametric methods employ prior distributions with large support in the space of probabilistic models. The flexibility of these methods enables them to address challenging inference tasks. This thesis develops Bayesian nonparametric methodologies for problems in epidemiology and clustering. During an infectious disease outbreak, there is interest in (1) understanding the impact of environmental conditions, human behavior, genetic variants, and public policy on transmission; (2) monitoring the rate of transmissions across regions and over time; and (3) producing short-term forecasts of disease incidence to facilitate decision making and planning by policy makers and members of the public. The data which are typically available to address these questions are incidence data – cases, hospitalizations, or deaths occurring in a certain population during a certain time interval – pose a challenge to methodology as they have an indirect and nonlinear relationship with the transmission rate, and they may suffer from artifacts and systematic biases that can vary across time and across regions. In Chapter 2 we exploit the flexibility of Bayesian nonparametric models to account for the many irregularities in these data.

Cluster analysis is the task of identifying meaningful subgroups in data. A large variety of algorithms for clustering have been developed, but within the Bayesian paradigm, clustering has nearly always been performed by associating observations with components of a mixture distribution. These mixture models are inherently limited by a tradeoff between component flexibility and identifiability. Thus, relatively inflexible components are used, often leading to disappointing results. In Chapter 3, we develop a decision theoretic framework for Bayesian Level-Set (BALLET) clustering, which exploits Bayesian nonparametric density posteriors. The approach avoids some pitfalls of classical Bayesian clustering methods by leveraging ideas from the algorithmic and frequentist literature. Finally, we note that level-set clustering represents a simple example of clustering into non-exchangeable subsets, since one part is designated as noise points. In Chapter 4, we develop loss functions for non-exchangeable partitions with an arbitrary number of categories. We show that the notion of Categorized Partitions (CaPos) is useful in practical situations and that our novel loss functions yield sensible decision-theoretic point estimates.






Buch, David Anthony (2023). Bayesian Nonparametric Methods for Epidemiology and Clustering. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.