Browsing by Author "Wolpert, Robert L"
Results Per Page
Sort Options
Item Open Access A Geometric Approach for Inference on Graphical Models(2009) Lunagomez, SimonWe formulate a novel approach to infer conditional independence models or Markov structure of a multivariate distribution. Specifically, our objective is to place informative prior distributions over graphs (decomposable and unrestricted) and sample efficiently from the induced posterior distribution. We also explore the idea of factorizing according to complete sets of a graph; which implies working with a hypergraph that cannot be retrieved from the graph alone. The key idea we develop in this paper is a parametrization of hypergraphs using the geometry of points in $R^m$. This induces informative priors on graphs from specified priors on finite sets of points. Constructing hypergraphs from finite point sets has been well studied in the fields of computational topology and random geometric graphs. We develop the framework underlying this idea and illustrate its efficacy using simulations.Item Open Access A Tapered Pareto-Poisson Model for Extreme Pyroclastic Flows: Application to the Quantification of Volcano Hazards(2015) Dai, FanThis paper intends to discuss the problems of parameter estimation in a proposed tapered Pareto-Poisson model for the assessment of large pyroclastic flows, which are essential in quantifying the size and risk of volcanic hazards. In dealing with the tapered Pareto distribution, the paper applies both maximum likelihood estimation and a Bayesian framework with objective priors and Metropolis algorithm. The techniques are further illustrated by an example of modeling extreme flow volumes at Soufriere Hills Volcano, and their simulation results are addressed.
Item Open Access Bayesian Modeling and Adaptive Monte Carlo with Geophysics Applications(2013) Wang, JianyuThe first part of the thesis focuses on the development of Bayesian modeling motivated by geophysics applications. In Chapter 2, we model the frequency of pyroclastic flows collected from the Soufriere Hills volcano. Multiple change points within the dataset reveal several limitations of existing methods in literature. We propose Bayesian hierarchical models (BBH) by introducing an extra level of hierarchy with hyper parameters, adding a penalty term to constrain close consecutive rates, and using a mixture prior distribution to more accurately match certain circumstances in reality. We end the chapter with a description of the prediction procedure, which is the biggest advantage of the BBH in comparison with other existing methods. In Chapter 3, we develop new statistical techniques to model and relate three complex processes and datasets: the process of extrusion of magma into the lava dome, the growth of the dome as measured by its height, and the rockfalls as an indication of the dome's instability. First, we study the dynamic Negative Binomial branching process and use it to model the rockfalls. Moreover, a generalized regression model is proposed to regress daily rockfall numbers on the extrusion rate and dome height. Furthermore, we solve an inverse problem from the regression model and predict extrusion rate based on rockfalls and dome height.
The other focus of the thesis is adaptive Markov chain Monte Carlo (MCMC) method. In Chapter 4, we improve upon the Wang-Landau (WL) algorithm. The WL algorithm is an adaptive sampling scheme that modifies the target distribution to enable the chain to visit low-density regions of the state space. However, the approach relies heavily on a partition of the state space that is left to the user to specify. As a result, the implementation and the use of the algorithm are time-consuming and less automatic. We propose an automatic, adaptive partitioning scheme which continually refines the initial partition as needed during sampling. We show that this overcomes the limitations of the input user-specified partition, making the algorithm significantly more automatic and user-friendly while also making the performance dramatically more reliable and robust. In Chapter 5, we consider the convergence and autocorrelation aspects of MCMC. We propose an Exploration/Exploitation (XX) approach to constructing adaptive MCMC algorithms, which combines adaptation schemes of distinct types. The exploration piece uses adaptation strategies aiming at exploring new regions of the target distribution and thus improving the rate of convergence to equilibrium. The exploitation piece involves an adaptation component which decreases autocorrelation for sampling among regions already discovered. We demonstrate that the combined XX algorithm significantly outperforms either original algorithm on difficult multimodal sampling problems.
Item Open Access Development and Implementation of Bayesian Computer Model Emulators(2011) Lopes, Danilo LourencoOur interest is the risk assessment of rare natural hazards, such as
large volcanic pyroclastic flows. Since catastrophic consequences of
volcanic flows are rare events, our analysis benefits from the use of
a computer model to provide information about these events under
natural conditions that may not have been observed in reality.
A common problem in the analysis of computer experiments, however, is the high computational cost associated with each simulation of a complex physical process. We tackle this problem by using a statistical approximation (emulator) to predict the output of this computer model at untried values of inputs. Gaussian process response surface is a technique commonly used in these applications, because it is fast and easy to use in the analysis.
We explore several aspects of the implementation of Gaussian process emulators in a Bayesian context. First, we propose an improvement for the implementation of the plug-in approach to Gaussian processes. Next, we also evaluate the performance of a spatial model for large data sets in the context of computer experiments.
Computer model data can also be combined to field observations in order to calibrate the emulator and obtain statistical approximations to the computer model that are closer to reality. We present an application where we learn the joint distribution of inputs from field data and then bind this auxiliary information to the emulator in a calibration process.
One of the outputs of our computer model is a surface of maximum volcanic flow height over some geographical area. We show how the topography of the volcano area plays an important role in determining the shape of this surface, and we propose methods
to incorporate geophysical information in the multivariate analysis of computer model output.
Item Open Access On Uncertainty Quantification for Systems of Computer Models(2017) Kyzyurova, KseniaScientific inquiry about natural phenomena and processes are increasingly relying on the use of computer models as simulators of such processes. The challenge of using computer models for scientific investigation is that they are expensive in terms of computational cost and resources. However, the core methodology of fast statistical emulation (approximation) of a computer model overcomes this computational problem.
Complex phenomena and processes are often described not by a single computer model, but by a system of computer models or simulators. Direct emulation of a system of simulators may be infeasible for computational and logistical reasons.
This thesis proposes a statistical framework for fast emulation of systems of computer models and demonstrates its potential for inferential and predictive scientific goals.
The first chapter of the thesis introduces the Gaussian stochastic process (GaSP) emulator of a single simulator and summarizes ideas and findings in the rest of the thesis. The second chapter investigates the possibility of using independent GaSP emulators of computer models for fast construction of emulators of systems of computer models. The resulting approximation to a system of computer models is called the linked emulator. The third chapter discusses the irrelevance of attempting to model multivariate output of a computer model, for the purpose of emulation of that model. The linear model of coregionalization (LMC) is used to demonstrate this irrelevance, from both a theoretical perspective and from simulation studies. The fourth chapter introduces a framework for calibration of a system of computer models, using its linked emulator. The linked emulator allows for development of independent emulators of submodels on their own separately constructed design spaces, thus leading to effective dimension reduction in explored parameter space. The fifth chapter addresses the use of some non-Gaussian emulators, in particular censored and truncated GaSP emulators. The censored emulator is constructed to appropriately account for zero-inflated output of a computer model, arising when there are large regions of the input space for which the computer model output is zero. The truncated GaSP accommodates computer model output that is constrained to appear in a certain region. The linked emulator, for systems of computer models whose individual subemulators are either censored or truncated, is also presented. The last chapter concludes with an exposition of further research directions based on the ideas explored in the thesis.
The methodology developed in this thesis is illustrated by an application to quantification of the hazard from pyroclastic flow from the Soufri\`{e}re Hills Volcano on the island of Montserrat; a case study on prediction of volcanic ash transport and dispersal from the Eyjafjallaj{\"o}kull volcano, Iceland in April 14-16, 2010; and calibration of a vapour-liquid equilibrium model, a submodel of the Aspen Plus \textcopyright~chemical process software for design and deployment of amine-based $\mathrm{CO_2}$ capture systems.
Item Open Access Redefine statistical significance(Nature Human Behaviour, 2017-09-03) Benjamin, Daniel J; Berger, James O; Johannesson, Magnus; Nosek, Brian A; Wagenmakers, E-J; Berk, Richard; Bollen, Kenneth A; Brembs, Björn; Brown, Lawrence; Camerer, Colin; Cesarini, David; Chambers, Christopher D; Clyde, Merlise; Cook, Thomas D; De Boeck, Paul; Dienes, Zoltan; Dreber, Anna; Easwaran, Kenny; Efferson, Charles; Fehr, Ernst; Fidler, Fiona; Field, Andy P; Forster, Malcolm; George, Edward I; Gonzalez, Richard; Goodman, Steven; Green, Edwin; Green, Donald P; Greenwald, Anthony G; Hadfield, Jarrod D; Hedges, Larry V; Held, Leonhard; Hua Ho, Teck; Hoijtink, Herbert; Hruschka, Daniel J; Imai, Kosuke; Imbens, Guido; Ioannidis, John PA; Jeon, Minjeong; Jones, James Holland; Kirchler, Michael; Laibson, David; List, John; Little, Roderick; Lupia, Arthur; Machery, Edouard; Maxwell, Scott E; McCarthy, Michael; Moore, Don A; Morgan, Stephen L; Munafó, Marcus; Nakagawa, Shinichi; Nyhan, Brendan; Parker, Timothy H; Pericchi, Luis; Perugini, Marco; Rouder, Jeff; Rousseau, Judith; Savalei, Victoria; Schönbrodt, Felix D; Sellke, Thomas; Sinclair, Betsy; Tingley, Dustin; Van Zandt, Trisha; Vazire, Simine; Watts, Duncan J; Winship, Christopher; Wolpert, Robert L; Xie, Yu; Young, Cristobal; Zinman, Jonathan; Johnson, Valen EItem Open Access Semiparametric Bayesian Regression with Applications in Astronomy(2014) Broadbent, Mary ElizabethIn this thesis we describe a class of Bayesian semiparametric models, known as Levy Adaptive Regression Kernels (LARK); a novel method for posterior computation for those models; and the applications of these models in astronomy, in particular to the analysis of the photon fluence time series of gamma-ray bursts. Gamma-ray bursts are bursts of photons which arrive in a varying number of overlapping pulses with a distinctive "fast-rise, exponential decay" shape in the time domain. LARK models allow us to do inference both on the number of pulses, but also on the parameters which describe the pulses, such as incident time, or decay rate.
In Chapter 2, we describe a novel method to aid posterior computation in infinitely-divisible models, of which LARK models are a special case, when the posterior is evaluated through Markov chain Monte Carlo. This is applied in Chapter 3, where time series representing the photon fluence in a single energy channel is analyzed using LARK methods.
Due to the effect of the discriminators on BATSE and other instruments, it is important to model the gamma-ray bursts in the incident space. Chapter 4 describes the first to model bursts in the incident photon space, instead of after they have been distorted by the discriminators; since to model photons as they enter the detector is to model both the energy and the arrival time of the incident photon, this model is also the first to jointly model the time and energy domains.
Item Open Access Topics in Bayesian Computer Model Emulation and Calibration, with Applications to High-Energy Particle Collisions(2019) Coleman, Jacob RyanProblems involving computer model emulation arise when scientists simulate expensive experiments with computationally expensive computer models. To more quickly probe the experimental design space, statisticians build emulators that act as fast surrogates to the computationally expensive computer models. The emulators are typically Gaussian processes, in order to induce spatial correlation in the input space. Often the main scientific interest lies in inference on one or more input parameters of the computer model which do not vary in nature. Inference on these input parameters is referred to as ``calibration,'' and these inputs are referred to as ``calibration parameters.'' We first detail our emulation and calibration model for an application in high-energy particle physics; this model brings together some existing ideas in the literature on handling multivariate output, and lays out a foundation for the remainder of the thesis.
In the next two chapters, we introduce novel ideas in the field of computer model emulation and calibration. The first addresses the problem of model comparison in this context, and how to simultaneously compare competing computer models while performing calibration. Using a mixture model to facilitate the comparison, we demonstrate that by conditioning on the mixture parameter we can recover the calibration parameter posterior from an independent calibration model. This mixture is then extended in the case of correlated data, a crucial innovation for this comparison framework to be useful in the particle collision setting. Lastly, we explore two possible non-exchangeable mixture models, where model preference changes over the input space.
The second novel idea addresses density estimation when only coarse bin counts are available. We develop an estimation method which avoids costly numerical integration and maintains plausible correlation for nearby bins. Additionally, we extend the method to density regression so that full a full density can be predicted from an input parameter, having only been trained on coarse histograms. This enables inference on the input parameter, and we develop an importance sampling method that compares favorably to the foundational calibration method detailed earlier.
Item Open Access Water Quality Models for Shellfish Harvesting Area Management(2008-08-19) Gronewold, AndrewThis doctoral dissertation presents the derivation and application of a series of water quality models and modeling strategies which provide critical guidance to water quality-based management decisions. Each model focuses on identifying and explicitly acknowledging uncertainty and variability in terrestrial and aquatic environments, and in water quality sampling and analysis procedures. While the modeling tools I have developed can be used to assist management decisions in waters with a wide range of designated uses, my research focuses on developing tools which can be integrated into a probabilistic or Bayesian network model supporting total maximum daily load (TMDL) assessments of impaired shellfish harvesting waters. Notable products of my research include a novel approach to assessing fecal indicator bacteria (FIB)-based water quality standards for impaired resource waters and new standards based on distributional parameters of the in situ FIB concentration probability distribution (as opposed to the current approach of using most probable number (MPN) or colony-forming unit (CFU) values). In addition, I develop a model explicitly acknowledging the probabilistic basis for calculating MPN and CFU values to determine whether a change in North Carolina Department of Environment and Natural Resources Shellfish Sanitation Section (NCDENR-SSS) standard operating procedure from a multiple tube fermentation (MTF)-based procedure to a membrane filtration (MF) procedure might cause a change in the observed frequency of water quality standard violations. This comparison is based on an innovative theoretical model of the MPN probability distribution for any observed CFU estimate from the same water quality sample, and is applied to recent water quality samples collected and analyzed by NCDENR-SSS for fecal coliform concentration using both MTF and MF analysis tests. I also develop the graphical model structure for a Bayesian network model relating FIB fate and transport processes with water quality-based management decisions, and encode a simplified version of the model in commercially available Bayesian network software. Finally, I present a Bayesian strategy for calibrating bacterial water quality models which improves model performance by explicitly acknowledging the probabilistic relationship between in situ FIB concentrations and common concentration estimating procedures.