Browsing by Author "Herring, Amy H"
- Results Per Page
- Sort Options
Item Open Access Advances in Bayesian Factor Modeling and Scalable Gaussian Process Regression(2020) Moran, Kelly R.Correlated measurements arise across a diverse array of disciplines such as epidemiology, toxicology, genomics, economics, and meteorology. Factor models describe the association between variables by assuming some latent factors drive structured variation therein. Gaussian process (GP) models, on the other hand, describe the association between variables using a distance-based covariance kernel. This dissertation introduces two novel extensions of Bayesian factor models driven by applied problems, and then proposes an algorithm to allow for scalable approximate Bayesian GP sampling. First, the FActor Regression for Verbal Autopsy (FARVA) model is developed for predicting the cause of death and cause-specific mortality fraction in low-resource settings based on verbal autopsies. Both the mean and the association between symptoms provides information used to differentiate decedents across cause of death groups. This class of hierarchical factor regression models avoids restrictive assumptions of standard methods, allows both the mean and covariance to vary with COD category, and can include covariate information on the decedent, region, or events surrounding death. Next, the Bayesian partially Supervised Sparse and Smooth Factor Analysis (BS3FA) model is developed to enable toxicologists, who are faced with a rising tide of chemicals under regulation and in use, to choose which chemicals to prioritize for screening and to predict the toxicity of as-yet-unscreened chemicals based on their molecular structure. Latent factors driving structured variability are assumed to be shared between the molecular structure observations and dose-response observations from high-throughput screening. These shared latent factors allow the model to learn a distance between chemicals targeted to toxicity, rather than one based on molecular structure alone. Finally, the Fast Increased Fidelity Approximate GP (FIFA-GP) allows for the association between observations to be modeled by a high fidelity Gaussian process approximation even when the number of observations is on the order of 10^5. A sampling algorithm that scales at O(n log^2(n)) time is described, and a proof showing that the approximation's Kullback-Leibler divergence to the true posterior can be made arbitrarily small is provided.
Item Open Access Advances in Bayesian Hierarchical Models for Complex Health Data(2023) Nguyen, Phuc HongWith the advancement of technology in screening and tracking risk factors as well as human health outcomes, there is increasing richness and complexity in health data. This dissertation presents methodological and applied work using Bayesian hierarchical models to exploit dependency structure in the data to improve estimation efficiency, and sometimes also reduce computational cost and increase interpretability. In Chapter 2, we present a multivariate factor analysis model with time-varying effects to assess the longitudinal effects of prenatal exposure to phthalates on the risk of childhood obesity in children aged 4 to 10. In Chapter 3, we present a framework and package for power analysis using Monte Carlo simulation for study design as well as model comparison of complex models for correlated chemical mixture exposure data. In Chapter 4, we introduce a new way to characterize bias due to unmeasured confounding using a set of imperfect negative control outcomes, taking advantage of the knowledge that they share common unobserved causes. Finally, in Chapter 5, we present a new tree representation of brain connectomes based on the biological hierarchy of brain regions. In all these applications, we use Bayesian hierarchical models for borrowing information across related observations and enforcing latent structures.
Item Open Access Advances in Bayesian Hierarchical Models Motivated by Environmental Applications(2023) Jin, BoraThis thesis presents Bayesian hierarchical models that are designed to tackle challenges and accommodate insights from environmental applications. In many environmental applications, we often face high-dimensional and/or large functional data with complex dependence structure. It is of fundamental interest to build an interpretable statistical model that appropriately characterizes the complex dependence and generates accurate predictions. First, Bayesian matrix completion (BMC) is developed to fill missing elements in a large but sparse binary matrix of bioactivity across thousands of chemicals and assay endpoints. Sparsity is a well-known problem in toxicology data because it is not feasible to test all possible combinations of chemicals and assay endpoints even with highly advanced technology. BMC tackles this sparsity through Bayesian hierarchical framework and simultaneously models heteroscedastic errors and a nonparametric mean function with common latent factors to suggest a more interpretable and broader definition of activity. Real application identifies chemicals most likely active for human disease outcomes. Next, Barrier Overlap-Removal Acyclic directed graph Gaussian Process (BORA-GP) is proposed, which is a class of scalable nonstationary Gaussian processes (GPs) that can handle complex geometries of domains. Spatial distribution of measurements that are observed only in some constrained domains can be significantly impacted by physical barriers in the domains. Typical spatial GP models are inappropriate in this case because they may lead to incorrect smoothing over the barriers. BORA-GP constructs sparse directed acyclic graphs (DAGs) with neighbors conforming to barriers, enabling characterization of physically sensible dependence in constrained domains. We apply BORA-GP to predict sea surface salinity (SSS) in the Arctic Ocean. Finally, we propose another class of nonstationary processes that characterize varying directional associations in space and time for point-referenced data. Our construction places a prior over possible directional edges within sparse DAGs, accounting for uncertainty in directional correlation patterns across a domain. The resulting Bag of DAGs processes (BAGs) lead to interpretable nonstationarity and scalability for large data due to sparsity of DAGs. We analyze spatiotemporal movement of fine particulate matter in California using BAGs in which a directed edge represents a prevailing wind direction causing some associated covariance in the particulate matters.
Item Open Access The Diet of Lumbee Indians in Robeson County, NC(2019) Zhao, XinluBackground: Nutrition and dietary patterns are one of the most crucial healthcare concerns, especially because of the close association with chronic diseases, such as obesity, diabetes, hypertension and cardiovascular disease and some cancers. Our study focused on the Lumbee Native American tribe of North Carolina, who have diet-related health conditions and susceptibility to chronic diseases. Our study aimed to identify the dietary patterns of the local community both Lumbee Indians and non-Lumbees living in the area and provide recommendations for future programs and policies. Methods: Our descriptive study explored dietary patterns (food groups and food categories) in Robeson County, NC. We used the National Health and Nutrition Examination Survey (NHANES) Food Frequency Questionnaire to record dietary information. We evaluated participants’ knowledge level of hypertension using the Hypertension Knowledge Questionnaire (HKQ). Results: We enrolled 277 participants, of whom 115 (50.6%) were Lumbee and 112 (49.3%) were non-Lumbee. Most of our participants were female (n= 137; 58%) with a median age of 68.39 (IQR 63.0 -76.0). The comparison of data from our participants and NHANES 2005–2006 reported significantly lower frequency intake among participants for fruits, vegetables, dairy, snacks, mixed dishes, and both alcoholic and nonalcoholic beverages (p<0.05). Our participants had a high consumption frequency of sugar-sweetened beverage in average (2.03 times per day). The consumption frequency of fruits, vegetables, grains, dairy, protein, oil and fat, and sugar was not significantly different across race or chronic disease status (p>0.05). Differences by gender was observed in the average consumption frequency of vegetables (3.46 times per day for female vs. 3.11 times per day for male), fruits (1.92 times per day for female vs. 1.82 times per day for male), and grains (2.47 times daily for male vs. 1.75 times daily for female). Most of our participants could answer more than 16 out of 25 questions on the HKQ correctly, while the correct rate for some food-related questions such as pickles and crackers were relatively low. Conclusions: People of different races and chronic disease status in Robeson County shared a similar dietary pattern in general, which was characterized by a low consumption frequency of fruits, vegetables, and dairy products compared with NHANES 2005–2006. While efforts are needed to address the health disparity in Robeson, policymakers should consider the unique role of the female in education and communication of dietary information. Stronger policies are needed to restrict consumption on low-nutrient and energy-dense food to improve the dietary pattern.