Browsing by Subject "Sensitivity analysis"
- Results Per Page
- Sort Options
Item Open Access Computational Journalism: from Answering Question to Questioning Answers and Raising Good Questions(2015) Wu, YouOur media is saturated with claims of ``facts'' made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim ``cherry-picking''? This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks --- reverse-engineering vague claims, and countering questionable claims --- as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place. This is achieved to using a limited number of high-valued claims to represent high-valued regions of the QRS. Besides the general purpose high-quality claim finding problem, lead-finding can be tailored towards specific claim quality measures, also defined within the QRS framework. An example of uniqueness-based lead-finding is presented for ``one-of-the-few'' claims, landing in interpretable high-quality claims, and an adjustable mechanism for ranking objects, e.g. NBA players, based on what claims can be made for them. Finally, we study the use of visualization as a powerful way of conveying results of a large number of claims. An efficient two stage sampling algorithm is proposed for generating input of 2d scatter plot with heatmap, evalutaing a limited amount of data, while preserving the two essential visual features, namely outliers and clusters. For all the problems, we present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.
Item Open Access Essays on Propensity Score Methods for Causal Inference in Observational Studies(2018) Nguyen, Nghi Le PhuongIn this dissertation, I present three essays from three different research projects and they involve different usages of propensity scores in drawing causal inferences in observational studies.
Chapter 1 talks about the general idea of causal inference as well as the concept of randomized experiments and observational studies. It introduces the three different projects and their contributions to the literature.
Chapter 2 gives a critical review and an extensive discussion of several commonly-used propensity score methods when the data have a multilevel structure, including matching, weighting, stratification, and methods that combine these with regression. The usage of these methods is illustrated using a data set about endoscopic vein-graft harvesting in coronary artery bypass graft (CABG) surgeries. We discuss important aspects of the implementation of these methods such as model specification and standard error calculations. Based on the comparison, we provide general guidelines for using propensity score methods with multilevel data in practice. We also provide the relevant code in the form of an \textsf{R} package, available on GitHub.
In observational studies, subjects are no longer assigned to treatment at random as in randomized experiments, and thus the association between the treatment and outcome can be due to some unmeasured variable that affects both the treatment and the outcome. Chapter 3 focuses on conducting sensitivity analysis to assess the robustness of the estimated quantity when the unconfoundedness assumption is violated. Two approaches to sensitivity analysis are presented, both are extensions from previous works to accommodate for a count outcome. One method is based on the subclassification estimator and it relies on maximum likelihood estimation. The second method is more flexible on the estimation method and is based on simulations. We illustrate both methods using a data set from a traffic safety research study about the safety effectiveness (measured in crash counts reduction) of the combined application of center line rumble strips and shoulder rumble strips on two-lane rural roads in Pennsylvania.
Chapter 4 proposes a method for estimating heterogeneous causal effects in observational studies by augmenting additive-interactive Gaussian process regression using the propensity scores, yielding a flexible yet robust way to predict the potential outcome surface from which the conditional treatment effects can be calculated. We show that our method works well even in presence of strong confounding and illustrate this by comparing with commonly-used methods in different settings using simulated data.
Finally, chapter 5 concludes this dissertation and discusses possible future works for each of the projects.
Item Open Access Multivariate Spatial Process Gradients with Environmental Applications(2014) Terres, Maria AntoniaPrevious papers have elaborated formal gradient analysis for spatial processes, focusing on the distribution theory for directional derivatives associated with a response variable assumed to follow a Gaussian process model. In the current work, these ideas are extended to additionally accommodate one or more continuous covariate(s) whose directional derivatives are of interest and to relate the behavior of the directional derivatives of the response surface to those of the covariate surface(s). It is of interest to assess whether, in some sense, the gradients of the response follow those of the explanatory variable(s), thereby gaining insight into the local relationships between the variables. The joint Gaussian structure of the spatial random effects and associated directional derivatives allows for explicit distribution theory and, hence, kriging across the spatial region using multivariate normal theory. The gradient analysis is illustrated for bivariate and multivariate spatial models, non-Gaussian responses such as presence-absence and point patterns, and outlined for several additional spatial modeling frameworks that commonly arise in the literature. Working within a hierarchical modeling framework, posterior samples enable all gradient analyses to occur as post model fitting procedures.
Item Open Access Uncertainty in the Bifurcation Diagram of a Model of Heart Rhythm Dynamics(2014) Ring, CarolineTo understand the underlying mechanisms of cardiac arrhythmias, computational models are used to study heart rhythm dynamics. The parameters of these models carry inherent uncertainty. Therefore, to interpret the results of these models, uncertainty quantification (UQ) and sensitivity analysis (SA) are important. Polynomial chaos (PC) is a computationally efficient method for UQ and SA in which a model output Y, dependent on some independent uncertain parameters represented by a random vector ξ, is approximated as a spectral expansion in multidimensional orthogonal polynomials in ξ. The expansion can then be used to characterize the uncertainty in Y.
PC methods were applied to UQ and SA of the dynamics of a two-dimensional return-map model of cardiac action potential duration (APD) restitution in a paced single cell. Uncertainty was considered in four parameters of the model: three time constants and the pacing stimulus strength. The basic cycle length (BCL) (the period between stimuli) was treated as the control parameter. Model dynamics was characterized with bifurcation analysis, which determines the APD and stability of fixed points of the model at a range of BCLs, and the BCLs at which bifurcations occur. These quantities can be plotted in a bifurcation diagram, which summarizes the dynamics of the model. PC UQ and SA were performed for these quantities. UQ results were summarized in a novel probabilistic bifurcation diagram that visualizes the APD and stability of fixed points as uncertain quantities.
Classical PC methods assume that model outputs exist and reasonably smooth over the full domain of ξ. Because models of heart rhythm often exhibit bifurcations and discontinuities, their outputs may not obey the existence and smoothness assumptions on the full domain, but only on some subdomains which may be irregularly shaped. On these subdomains, the random variables representing the parameters may no longer be independent. PC methods therefore must be modified for analysis of these discontinuous quantities. The Rosenblatt transformation maps the variables on the subdomain onto a rectangular domain; the transformed variables are independent and uniformly distributed. A new numerical estimation of the Rosenblatt transformation was developed that improves accuracy and computational efficiency compared to existing kernel density estimation methods. PC representations of the outputs in the transformed variables were then constructed. Coefficients of the PC expansions were estimated using Bayesian inference methods. For discontinuous model outputs, SA was performed using a sampling-based variance-reduction method, with the PC estimation used as an efficient proxy for the full model.
To evaluate the accuracy of the PC methods, PC UQ and SA results were compared to large-sample Monte Carlo UQ and SA results. PC UQ and SA of the fixed point APDs, and of the probability that a stable fixed point existed at each BCL, was very close to MC UQ results for those quantities. However, PC UQ and SA of the bifurcation BCLs was less accurate compared to MC results.
The computational time required for PC and Monte Carlo methods was also compared. PC analysis (including Rosenblatt transformation and Bayesian inference) required less than 10 total hours of computational time, of which approximately 30 minutes was devoted to model evaluations, compared to approximately 65 hours required for Monte Carlo sampling of the model outputs at 1 × 106 ξ points.
PC methods provide a useful framework for efficient UQ and SA of the bifurcation diagram of a model of cardiac APD dynamics. Model outputs with bifurcations and discontinuities can be analyzed using modified PC methods. The methods applied and developed in this study may be extended to other models of heart rhythm dynamics. These methods have potential for use for uncertainty and sensitivity analysis in many applications of these models, including simulation studies of heart rate variability, cardiac pathologies, and interventions.