Browsing by Subject "Bayesian methods"
Results Per Page
Sort Options
Item Open Access A Geometric Approach for Inference on Graphical Models(2009) Lunagomez, SimonWe formulate a novel approach to infer conditional independence models or Markov structure of a multivariate distribution. Specifically, our objective is to place informative prior distributions over graphs (decomposable and unrestricted) and sample efficiently from the induced posterior distribution. We also explore the idea of factorizing according to complete sets of a graph; which implies working with a hypergraph that cannot be retrieved from the graph alone. The key idea we develop in this paper is a parametrization of hypergraphs using the geometry of points in $R^m$. This induces informative priors on graphs from specified priors on finite sets of points. Constructing hypergraphs from finite point sets has been well studied in the fields of computational topology and random geometric graphs. We develop the framework underlying this idea and illustrate its efficacy using simulations.Item Open Access Bayesian Methods to Characterize Uncertainty in Predictive Modeling of the Effect of Urbanization on Aquatic Ecosystems(2010) Kashuba, Roxolana OrestaUrbanization causes myriad changes in watershed processes, ultimately disrupting the structure and function of stream ecosystems. Urban development introduces contaminants (human waste, pesticides, industrial chemicals). Impervious surfaces and artificial drainage systems speed the delivery of contaminants to streams, while bypassing soil filtration and local riparian processes that can mitigate the impacts of these contaminants, and disrupting the timing and volume of hydrologic patterns. Aquatic habitats where biota live are degraded by sedimentation, channel incision, floodplain disconnection, substrate alteration and elimination of reach diversity. These compounding changes ultimately lead to alteration of invertebrate community structure and function. Because the effects of urbanization on stream ecosystems are complex, multilayered, and interacting, modeling these effects presents many unique challenges, including: addressing and quantifying processes at multiple scales, representing major interrelated simultaneously acting dynamics at the system level, incorporating uncertainty resulting from imperfect knowledge, imperfect data, and environmental variability, and integrating multiple sources of available information about the system into the modeling construct. These challenges can be addressed by using a Bayesian modeling approach. Specifically, the use of multilevel hierarchical models and Bayesian network models allows the modeler to harness the hierarchical nature of the U.S. Geological Survey (USGS) Effect of Urbanization on Stream Ecosystems (EUSE) dataset to predict invertebrate response at both basin and regional levels, concisely represent and parameterize this system of complicated cause and effect relationships and uncertainties, calculate the full probabilistic function of all variables efficiently as the product of more manageable conditional probabilities, and includes both expert knowledge and data. Utilizing this Bayesian framework, this dissertation develops a series of statistically rigorous and ecologically interpretable models predicting the effect of urbanization on invertebrates, as well as a unique, systematic methodology that creates an informed expert prior and then updates this prior with available data using conjugate Dirichlet-multinomial distribution forms. The resulting models elucidate differences between regional responses to urbanization (particularly due to background agriculture and precipitation) and address the influences of multiple urban induced stressors acting simultaneously from a new system-level perspective. These Bayesian modeling approaches quantify previously unexplained regional differences in biotic response to urbanization, capture multiple interacting environmental and ecological processes affected by urbanization, and ultimately link urbanization effects on stream biota to a management context such that these models describe and quantify how changes in drivers lead to changes in regulatory endpoint (the Biological Condition Gradient; BCG).
Item Open Access Bayesian Models for Causal Analysis with Many Potentially Weak Instruments(2015) Jiang, ShengThis paper investigates Bayesian instrumental variable models with many instruments. The number of instrumental variables grows with the sample size and is allowed to be much larger than the sample size. With some sparsity condition on the coefficients on the instruments, we characterize a general prior specification where the posterior consistency of the parameters is established and calculate the corresponding convergence rate.
In particular, we show the posterior consistency for a class of spike and slab priors on the many potentially weak instruments. The spike and slab prior shrinks the number of instrumental variables, which avoids overfitting and provides uncertainty quantifications on the first stage. A simulation study is conducted to illustrate the convergence notion and estimation/selection performance under dependent instruments. Computational issues related to the Gibbs sampler are also discussed.
Item Open Access Some Recent Advances in Non- and Semiparametric Bayesian Modeling with Copulas, Mixtures, and Latent Variables(2013) Murray, JaredThis thesis develops flexible non- and semiparametric Bayesian models for mixed continuous, ordered and unordered categorical data. These methods have a range of possible applications; the applications considered in this thesis are drawn primarily from the social sciences, where multivariate, heterogeneous datasets with complex dependence and missing observations are the norm.
The first contribution is an extension of the Gaussian factor model to Gaussian copula factor models, which accommodate continuous and ordinal data with unspecified marginal distributions. I describe how this model is the most natural extension of the Gaussian factor model, preserving its essential dependence structure and the interpretability of factor loadings and the latent variables. I adopt an approximate likelihood for posterior inference and prove that, if the Gaussian copula model is true, the approximate posterior distribution of the copula correlation matrix asymptotically converges to the correct parameter under nearly any marginal distributions. I demonstrate with simulations that this method is both robust and efficient, and illustrate its use in an application from political science.
The second contribution is a novel nonparametric hierarchical mixture model for continuous, ordered and unordered categorical data. The model includes a hierarchical prior used to couple component indices of two separate models, which are also linked by local multivariate regressions. This structure effectively overcomes the limitations of existing mixture models for mixed data, namely the overly strong local independence assumptions. In the proposed model local independence is replaced by local conditional independence, so that the induced model is able to more readily adapt to structure in the data. I demonstrate the utility of this model as a default engine for multiple imputation of mixed data in a large repeated-sampling study using data from the Survey of Income and Participation. I show that it improves substantially on its most popular competitor, multiple imputation by chained equations (MICE), while enjoying certain theoretical properties that MICE lacks.
The third contribution is a latent variable model for density regression. Most existing density regression models are quite flexible but somewhat cumbersome to specify and fit, particularly when the regressors are a combination of continuous and categorical variables. The majority of these methods rely on extensions of infinite discrete mixture models to incorporate covariate dependence in mixture weights, atoms or both. I take a fundamentally different approach, introducing a continuous latent variable which depends on covariates through a parametric regression. In turn, the observed response depends on the latent variable through an unknown function. I demonstrate that a spline prior for the unknown function is quite effective relative to Dirichlet Process mixture models in density estimation settings (i.e., without covariates) even though these Dirichlet process mixtures have better theoretical properties asymptotically. The spline formulation enjoys a number of computational advantages over more flexible priors on functions. Finally, I demonstrate the utility of this model in regression applications using a dataset on U.S. wages from the Census Bureau, where I estimate the return to schooling as a smooth function of the quantile index.