Browsing by Author "Berger, James O"
Results Per Page
Sort Options
Item Open Access Bayesian Adjustment for Multiplicity(2009) Scott, James GordonThis thesis is about Bayesian approaches for handling multiplicity. It considers three main kinds of multiple-testing scenarios: tests of exchangeable experimental units, tests for variable inclusion in linear regresson models, and tests for conditional independence in jointly normal vectors. Multiplicity adjustment in these three areas will be seen to have many common structural features. Though the modeling approach throughout is Bayesian, frequentist reasoning regarding error rates will often be employed.
Chapter 1 frames the issues in the context of historical debates about Bayesian multiplicity adjustment. Chapter 2 confronts the problem of large-scale screening of functional data, where control over Type-I error rates is a crucial issue. Chapter 3 develops new theory for comparing Bayes and empirical-Bayes approaches for multiplicity correction in regression variable selection. Chapters 4 and 5 describe new theoretical and computational tools for Gaussian graphical-model selection, where multiplicity arises in performing many simultaneous tests of pairwise conditional independence. Chapter 6 introduces a new approach to sparse-signal modeling based upon local shrinkage rules. Here the focus is not on multiplicity per se, but rather on using ideas from Bayesian multiple-testing models to motivate a new class of multivariate scale-mixture priors. Finally, Chapter 7 describes some directions for future study, many of which are the subjects of my current research agenda.
Item Open Access Bayesian Model Uncertainty and Foundations(2018) Pena, VictorThis dissertation contains research on Bayesian model uncertainty and foundations of statistical inference.
In Chapter 2, we study the properties of constrained empirical Bayes (EB) priors on regression coefficients. Unrestricted EB procedures can have undesirable properties when their ``estimates'' correspond to hyperparameters that would be seen as overly informative in an actual Bayesian analysis. For that reason, we propose constraining EB procedures so that they are at least as vague as proper Bayesian lower bounds (which can be either informative or ``noninformative''). The main emphasis of the chapter is studying the properties of a constrained EB prior that has Zellner's g-prior with g=n as its lower bound. We show that it avoids some of the pitfalls of unconstrained EB priors and the lower bound, and see that it behaves similarly to the Bayesian Information Criterion (BIC).
In Chapter 3, we take a close look at ``information inconsistency.'' Information inconsistency is said to occur when there is overwhelming evidence in favor of a hypothesis in finite sample sizes, but the Bayes factor in its favor is finite. In Chapter 3, we investigate when it occurs (and when it does not) in normal linear models. Our conclusion is that conjugate priors are usually information-inconsistent, but thick-tailed priors and empirical Bayes procedures avoid the issue. The chapter also includes a discussion of the different formalizations of information inconsistency that have appeared in the literature, which are not equivalent.
In Chapter 4, we turn to ``limit consistency,'' which is an asymptotic property of two-sample tests. Suppose the sample size of one of the groups goes to infinity while the sample size of the other one stays fixed. According to our definition, limit consistency occurs if, under this asymptotic regime, the decision rule of the two-sample test converges to the decision rule of the one-sample test we would have performed had we known the parameters of the group with ``infinite'' data. In Chapter 4, we study limit consistency in the context of comparing whether two normal means are equal. We conclude that parametrizations where the 2 groups have common parameters are generally limit-consistent when the prior on the common parameters is flat.
Finally, the goal of Chapter 5 is discussing 2 articles that cast doubt on the correctness and applicability of Birnbaum's theorem, which implies that statisticians that wish to respect the sufficiency and conditionality principle must accept the likelihood principle. This result, which was proved in 1962, is still highly controversial because some statisticians believe that sufficiency and conditionality are appealing, but the likelihood principle is not (for example, the likelihood principle precludes the use of p-values, which are highly popular in common statistical practice). In Chapter 5, we provide counterarguments to the criticisms and put them in historical context.
Item Open Access Bayesian Modeling Using Latent Structures(2012) Wang, XiaojingThis dissertation is devoted to modeling complex data from the
Bayesian perspective via constructing priors with latent structures.
There are three major contexts in which this is done -- strategies for
the analysis of dynamic longitudinal data, estimating
shape-constrained functions, and identifying subgroups. The
methodology is illustrated in three different
interdisciplinary contexts: (1) adaptive measurement testing in
education; (2) emulation of computer models for vehicle crashworthiness; and (3) subgroup analyses based on biomarkers.
Chapter 1 presents an overview of the utilized latent structured
priors and an overview of the remainder of the thesis. Chapter 2 is
motivated by the problem of analyzing dichotomous longitudinal data
observed at variable and irregular time points for adaptive
measurement testing in education. One of its main contributions lies
in developing a new class of Dynamic Item Response (DIR) models via
specifying a novel dynamic structure on the prior of the latent
trait. The Bayesian inference for DIR models is undertaken, which
permits borrowing strength from different individuals, allows the
retrospective analysis of an individual's changing ability, and
allows for online prediction of one's ability changes. Proof of
posterior propriety is presented, ensuring that the objective
Bayesian analysis is rigorous.
Chapter 3 deals with nonparametric function estimation under
shape constraints, such as monotonicity, convexity or concavity. A
motivating illustration is to generate an emulator to approximate a computer
model for vehicle crashworthiness. Although Gaussian processes are
very flexible and widely used in function estimation, they are not
naturally amenable to incorporation of such constraints. Gaussian
processes with the squared exponential correlation function have the
interesting property that their derivative processes are also
Gaussian processes and are jointly Gaussian processes with the
original Gaussian process. This allows one to impose shape constraints
through the derivative process. Two alternative ways of incorporating derivative
information into Gaussian processes priors are proposed, with one
focusing on scenarios (important in emulation of computer
models) in which the function may have flat regions.
Chapter 4 introduces a Bayesian method to control for multiplicity
in subgroup analyses through tree-based models that limit the
subgroups under consideration to those that are a priori plausible.
Once the prior modeling of the tree is accomplished, each tree will
yield a statistical model; Bayesian model selection analyses then
complete the statistical computation for any quantity of interest,
resulting in multiplicity-controlled inferences. This research is
motivated by a problem of biomarker and subgroup identification to
develop tailored therapeutics. Chapter 5 presents conclusions and
some directions for future research.
Item Open Access Development and Implementation of Bayesian Computer Model Emulators(2011) Lopes, Danilo LourencoOur interest is the risk assessment of rare natural hazards, such as
large volcanic pyroclastic flows. Since catastrophic consequences of
volcanic flows are rare events, our analysis benefits from the use of
a computer model to provide information about these events under
natural conditions that may not have been observed in reality.
A common problem in the analysis of computer experiments, however, is the high computational cost associated with each simulation of a complex physical process. We tackle this problem by using a statistical approximation (emulator) to predict the output of this computer model at untried values of inputs. Gaussian process response surface is a technique commonly used in these applications, because it is fast and easy to use in the analysis.
We explore several aspects of the implementation of Gaussian process emulators in a Bayesian context. First, we propose an improvement for the implementation of the plug-in approach to Gaussian processes. Next, we also evaluate the performance of a spatial model for large data sets in the context of computer experiments.
Computer model data can also be combined to field observations in order to calibrate the emulator and obtain statistical approximations to the computer model that are closer to reality. We present an application where we learn the joint distribution of inputs from field data and then bind this auxiliary information to the emulator in a calibration process.
One of the outputs of our computer model is a surface of maximum volcanic flow height over some geographical area. We show how the topography of the volcano area plays an important role in determining the shape of this surface, and we propose methods
to incorporate geophysical information in the multivariate analysis of computer model output.
Item Open Access Interfaces between Bayesian and Frequentist Multiplte Testing(2015) Chang, Shih-HanThis thesis investigates frequentist properties of Bayesian multiple testing procedures in a variety of scenarios and depicts the asymptotic behaviors of Bayesian methods. Both Bayesian and frequentist approaches to multiplicity control are studied and compared, with special focus on understanding the multiplicity control behavior in situations of dependence between test statistics.
Chapter 2 examines a problem of testing mutually exclusive hypotheses with dependent data. The Bayesian approach is shown to have excellent frequentist properties and is argued to be the most effective way of obtaining frequentist multiplicity control without sacrificing power. Chapter 3 further generalizes the model such that multiple signals are acceptable, and depicts the asymptotic behavior of false positives rates and the expected number of false positives. Chapter 4 considers the problem of dealing with a sequence of different trials concerning some medical or scientific issue, and discusses the possibilities for multiplicity control of the sequence. Chapter 5 addresses issues and efforts in reconciling frequentist and Bayesian approaches in sequential endpoint testing. We consider the conditional frequentist approach in sequential endpoint testing and show several examples in which Bayesian and frequentist methodologies cannot be made to match.
Item Open Access On Uncertainty Quantification for Systems of Computer Models(2017) Kyzyurova, KseniaScientific inquiry about natural phenomena and processes are increasingly relying on the use of computer models as simulators of such processes. The challenge of using computer models for scientific investigation is that they are expensive in terms of computational cost and resources. However, the core methodology of fast statistical emulation (approximation) of a computer model overcomes this computational problem.
Complex phenomena and processes are often described not by a single computer model, but by a system of computer models or simulators. Direct emulation of a system of simulators may be infeasible for computational and logistical reasons.
This thesis proposes a statistical framework for fast emulation of systems of computer models and demonstrates its potential for inferential and predictive scientific goals.
The first chapter of the thesis introduces the Gaussian stochastic process (GaSP) emulator of a single simulator and summarizes ideas and findings in the rest of the thesis. The second chapter investigates the possibility of using independent GaSP emulators of computer models for fast construction of emulators of systems of computer models. The resulting approximation to a system of computer models is called the linked emulator. The third chapter discusses the irrelevance of attempting to model multivariate output of a computer model, for the purpose of emulation of that model. The linear model of coregionalization (LMC) is used to demonstrate this irrelevance, from both a theoretical perspective and from simulation studies. The fourth chapter introduces a framework for calibration of a system of computer models, using its linked emulator. The linked emulator allows for development of independent emulators of submodels on their own separately constructed design spaces, thus leading to effective dimension reduction in explored parameter space. The fifth chapter addresses the use of some non-Gaussian emulators, in particular censored and truncated GaSP emulators. The censored emulator is constructed to appropriately account for zero-inflated output of a computer model, arising when there are large regions of the input space for which the computer model output is zero. The truncated GaSP accommodates computer model output that is constrained to appear in a certain region. The linked emulator, for systems of computer models whose individual subemulators are either censored or truncated, is also presented. The last chapter concludes with an exposition of further research directions based on the ideas explored in the thesis.
The methodology developed in this thesis is illustrated by an application to quantification of the hazard from pyroclastic flow from the Soufri\`{e}re Hills Volcano on the island of Montserrat; a case study on prediction of volcanic ash transport and dispersal from the Eyjafjallaj{\"o}kull volcano, Iceland in April 14-16, 2010; and calibration of a vapour-liquid equilibrium model, a submodel of the Aspen Plus \textcopyright~chemical process software for design and deployment of amine-based $\mathrm{CO_2}$ capture systems.
Item Open Access Redefine statistical significance(Nature Human Behaviour, 2017-09-03) Benjamin, Daniel J; Berger, James O; Johannesson, Magnus; Nosek, Brian A; Wagenmakers, E-J; Berk, Richard; Bollen, Kenneth A; Brembs, Björn; Brown, Lawrence; Camerer, Colin; Cesarini, David; Chambers, Christopher D; Clyde, Merlise; Cook, Thomas D; De Boeck, Paul; Dienes, Zoltan; Dreber, Anna; Easwaran, Kenny; Efferson, Charles; Fehr, Ernst; Fidler, Fiona; Field, Andy P; Forster, Malcolm; George, Edward I; Gonzalez, Richard; Goodman, Steven; Green, Edwin; Green, Donald P; Greenwald, Anthony G; Hadfield, Jarrod D; Hedges, Larry V; Held, Leonhard; Hua Ho, Teck; Hoijtink, Herbert; Hruschka, Daniel J; Imai, Kosuke; Imbens, Guido; Ioannidis, John PA; Jeon, Minjeong; Jones, James Holland; Kirchler, Michael; Laibson, David; List, John; Little, Roderick; Lupia, Arthur; Machery, Edouard; Maxwell, Scott E; McCarthy, Michael; Moore, Don A; Morgan, Stephen L; Munafó, Marcus; Nakagawa, Shinichi; Nyhan, Brendan; Parker, Timothy H; Pericchi, Luis; Perugini, Marco; Rouder, Jeff; Rousseau, Judith; Savalei, Victoria; Schönbrodt, Felix D; Sellke, Thomas; Sinclair, Betsy; Tingley, Dustin; Van Zandt, Trisha; Vazire, Simine; Watts, Duncan J; Winship, Christopher; Wolpert, Robert L; Xie, Yu; Young, Cristobal; Zinman, Jonathan; Johnson, Valen EItem Open Access Robust Uncertainty Quantification and Scalable Computation for Computer Models with Massive Output(2016) Gu, Mengyang GuUncertainty quantification (UQ) is both an old and new concept. The current novelty lies in the interactions and synthesis of mathematical models, computer experiments, statistics, field/real experiments, and probability theory, with a particular emphasize on the large-scale simulations by computer models. The challenges not only come from the complication of scientific questions, but also from the size of the information. It is the focus in this thesis to provide statistical models that are scalable to massive data produced in computer experiments and real experiments, through fast and robust statistical inference.
Chapter 2 provides a practical approach for simultaneously emulating/approximating massive number of functions, with the application on hazard quantification of Soufri\`{e}re Hills volcano in Montserrate island. Chapter 3 discusses another problem with massive data, in which the number of observations of a function is large. An exact algorithm that is linear in time is developed for the problem of interpolation of Methylation levels. Chapter 4 and Chapter 5 are both about the robust inference of the models. Chapter 4 provides a new criteria robustness parameter estimation criteria and several ways of inference have been shown to satisfy such criteria. Chapter 5 develops a new prior that satisfies some more criteria and is thus proposed to use in practice.