Advances in Bayesian Hierarchical Models Motivated by Environmental Applications

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



This thesis presents Bayesian hierarchical models that are designed to tackle challenges and accommodate insights from environmental applications. In many environmental applications, we often face high-dimensional and/or large functional data with complex dependence structure. It is of fundamental interest to build an interpretable statistical model that appropriately characterizes the complex dependence and generates accurate predictions. First, Bayesian matrix completion (BMC) is developed to fill missing elements in a large but sparse binary matrix of bioactivity across thousands of chemicals and assay endpoints. Sparsity is a well-known problem in toxicology data because it is not feasible to test all possible combinations of chemicals and assay endpoints even with highly advanced technology. BMC tackles this sparsity through Bayesian hierarchical framework and simultaneously models heteroscedastic errors and a nonparametric mean function with common latent factors to suggest a more interpretable and broader definition of activity. Real application identifies chemicals most likely active for human disease outcomes. Next, Barrier Overlap-Removal Acyclic directed graph Gaussian Process (BORA-GP) is proposed, which is a class of scalable nonstationary Gaussian processes (GPs) that can handle complex geometries of domains. Spatial distribution of measurements that are observed only in some constrained domains can be significantly impacted by physical barriers in the domains. Typical spatial GP models are inappropriate in this case because they may lead to incorrect smoothing over the barriers. BORA-GP constructs sparse directed acyclic graphs (DAGs) with neighbors conforming to barriers, enabling characterization of physically sensible dependence in constrained domains. We apply BORA-GP to predict sea surface salinity (SSS) in the Arctic Ocean. Finally, we propose another class of nonstationary processes that characterize varying directional associations in space and time for point-referenced data. Our construction places a prior over possible directional edges within sparse DAGs, accounting for uncertainty in directional correlation patterns across a domain. The resulting Bag of DAGs processes (BAGs) lead to interpretable nonstationarity and scalability for large data due to sparsity of DAGs. We analyze spatiotemporal movement of fine particulate matter in California using BAGs in which a directed edge represents a prevailing wind direction causing some associated covariance in the particulate matters.






Jin, Bora (2023). Advances in Bayesian Hierarchical Models Motivated by Environmental Applications. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.