Bayesian Multivariate Count Models for the Analysis of Microbiome Studies

dc.contributor.advisor

David, Lawrence A

dc.contributor.author

Silverman, Justin David

dc.date.accessioned

2019-06-07T19:48:12Z

dc.date.available

2019-06-07T19:48:12Z

dc.date.issued

2019

dc.department

Computational Biology and Bioinformatics

dc.description.abstract

Advances in high-throughput DNA sequencing allow for rapid and affordable surveys of thousands of bacterial taxa across thousands of samples. The exploding availability of sequencing data has poised microbiota research to advance our understanding of fields as diverse as ecology, evolution, medicine, and agriculture. Yet, while microbiota data is now ubiquitous, methods for the analysis of such data remain underdeveloped. This gap reflects the challenge of analyzing sparse high-dimensional count data that contains compositional (relative abundance) information. To address these challenges this dissertation introduces a number of tools for Bayesian inference applied to microbiome data. A central theme throughout this work is the use of multinomial logistic-normal models which are found to concisely address these challenges. In particular, the connection between the logistic-normal distribution and the Aitchison geometry of the simplex is commonly used to develop interpretable tools for the analysis of microbiome data.

The structure of this dissertation is as follows. Chapter 1 introduces key challenges in the analysis of microbiome data. Chapter 2 introduces a novel log-ratio transform between the simplex and Real space to enable the development of statistical tools for compositional data with phylogenetic structure. Chapter 3 introduces a multinomial logistic-normal generalized dynamic linear modelling framework for analysis of microbiome time-series data. Chapter 4 explores the analysis of zero values in sequence count data from a stochastic process perspective and demonstrates that zero-inflated models often produce counter-intuitive results in this this regime. Finally, Chapter 5 introduces the theory of Marginally Latent Matrix-T Processes as a means of developing efficient accurate inference for a large class of both multinomial logistic-normal models including linear regression, non-linear regression, and dynamic linear models. Notably, the inference schemes developed in Chapter 5 are found to often be orders of magnitude faster than Hamiltonian Monte Carlo without sacrificing accuracy in point estimation or uncertainty quantification.

dc.identifier.uri

https://hdl.handle.net/10161/18669

dc.subject

Statistics

dc.subject

Microbiology

dc.subject

Bioinformatics

dc.subject

Bayesian

dc.subject

Compositional Data

dc.subject

High-dimensional data

dc.subject

Microbiome

dc.subject

Sequencing

dc.subject

Time-series

dc.title

Bayesian Multivariate Count Models for the Analysis of Microbiome Studies

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Silverman_duke_0066D_14963.pdf
Size:
19.78 MB
Format:
Adobe Portable Document Format

Collections