Advances in Bayesian Hierarchical Modeling with Tree-based Methods
Developing flexible tools that apply to datasets with large size and complex structure while providing interpretable outputs is a major goal of modern statistical modeling. A family of models that are especially suitable for this task is the P\'olya tree type models. Following a divide-and-conquer strategy, these tree-based methods transform the original task into a series of tasks that are smaller in size and easier to solve while their nonparametric nature guarantees the modeling flexibility to cope with datasets with a complex structure. In this work, we develop three novel tree-based methods that tackle different challenges in Bayesian hierarchical modeling. Our first two methods are designed specifically for the microbiome sequencing data, which consists of high dimensional counts with a complex, domain-specific covariate structure and exhibits large cross-sample variations. These features limit the performance of generic statistical tools and require special modeling considerations. Both methods inherit the flexibility and computation efficiency from the general tree-based methods and directly utilize the domain knowledge to help infer the complex dependency structure among different microbiome categories by bringing the phylogenetic tree into the modeling framework. An important task in microbiome research is to compare the composition of the microbial community of groups of subjects. We first propose a model for this classic two-sample problem in the microbiome context by transforming the original problem into a multiple testing problem, with a series of tests defined at the internal nodes of the phylogenetic tree. To improve the power of the test, we use a graphical model to allow information sharing among the tests. A regression-type adjustment is also considered to reduce the chance of false discovery. Next, we introduce a model-based clustering method for the microbiome count data with a Dirichlet process mixtures setup. The phylogenetic tree is used for constructing the mixture kernels to offer a flexible covariate structure. To improve the ability to detect clusters determined not only by the dominating microbiome categories, a subroutine is introduced in the clustering procedure that selects a subset of internal nodes of the tree which are relevant for clustering. This subroutine is also important in avoiding potential overfitting. Our third contribution proposes a framework for causal inference through Bayesian recursive partitioning that allows joint modeling of the covariate balancing and the potential outcome. With a retrospective perspective, we model the covariates and the outcome conditioning on the treatment assignment status. For the challenging multivariate covariate modeling, we adopt a flexible nonparametric prior that focuses on the relation of the covariate distributions under the two treatment groups, while integrating out other aspects of these distributions that are irrelevant for estimating the causal effect.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info