Ecological Modeling via Bayesian Nonparametric Species Sampling Priors

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Species sampling models are a broad class of discrete Bayesian nonparametric priors that model the sequential appearance of distinct tags, called species or clusters, in a sequence of labeled objects. Over the last 50 years, species sampling priors have found much success in a variety of settings, including clustering and density estimation. However, despite the rich theoretical and methodological developments, these models have rarely been used as tools by applied ecologists, even though their primary investigation often involves the modeling of actual species. This dissertation aims at partially filling this gap by elucidating how species sampling models can be useful to scientists and practitioners in the ecological field. Our emphasis is on clustering and on species discovery properties linked to species sampling models. In particular, Chapter 2 illustrates how a Dirichlet process mixture model with a random precision parameter leads to greater robustness when inferring the number of clusters, or communities, in a given population. We specifically introduce a novel prior for the precision, called Stirling-gamma distribution, which allows for transparent elicitation supported by theoretical findings. We illustrate its advantages when detecting communities in a colony of ant workers. Chapter 3 presents a general Bayesian framework to model accumulation curves, which summarize the sequential discoveries of distinct species over time. This work is inspired by traditional species sampling models such as the Dirichlet process and the Pitman--Yor process. By modeling the discovery probability as a survival function of some latent variables, a flexible specification that can account for both finite and infinite species richness is developed. We apply our model to a large fungal biodiversity study from Finland. Finally, Chapter 4 presents a novel Bayesian nonparametric taxonomic classifier called BayesANT. Here, the goal is to predict the taxonomy of DNA sequences sampled from the environment. The difficulty of such a task is that the vast majority of species do not have a reference barcode or are yet unknown to science. Hence, species novelty needs to be accounted for when doing classification. BayesANT builds upon Dirichlet-multinomial kernels to model DNA sequences, and upon species sampling models to account for such potential novelty. We show how it attains excellent classification performances, especially when the true taxa of the test sequences are not observed in the training set.All methods presented in this dissertation are freely available as R packages. Our hope is that these contributions will pave the way for future utilization of Bayesian nonparametric methods in applied ecological analyses.





Zito, Alessandro (2023). Ecological Modeling via Bayesian Nonparametric Species Sampling Priors. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.