Record Linkage Methods with Applications to Causal Inference and Election Voting Data

Wortman, Joan Pearson Heck

Record Linkage Methods with Applications to Causal Inference and Election Voting Data

View / Download1012.27 KB

Date

2019

Authors

Wortman, Joan Pearson Heck

Advisors

Reiter, Jerome P

Repository Usage Stats

797
views

971
downloads

Abstract

Probabilistic record linkage enables researchers and analysts to combine data from multiple data sources to conduct statistical analysis. This analysis may be to answer causal questions, to predict future outcomes, or to provide descriptive statistics. In this dissertation, I develop methodology for probabilistic record linkage for two scenarios: general causal inference applications with linked data, and identifying previously removed voters in North Carolina who cast provisional ballots in 2016.

In Chapter 2, we develop methodology for causal inference in observational studies when using propensity score subclassification on data constructed with probabilistic record linkage techniques. We focus on scenarios where covariates and binary treatment assignments are in one file and outcomes are in another file, and the goal is to estimate an additive treatment effect by merging the files. We assume that the files can be linked using variables common to both files, e.g., names or birth dates, but that links are subject to errors, e.g., due to reporting errors in the linking variables. We develop methodology for cases where such reporting errors are independent of the other variables on the files. We describe conceptually how linkage errors can affect causal estimates in subclassification contexts. We also present and evaluate several algorithms for deciding which record pairs to use in estimation of causal effects. Using simulation studies, we demonstrate that case selection procedures can result in improved accuracy in estimates of treatment effects from linked data compared to using only cases known to be true links.

In Chapter 3, we introduce a model for Bayesian record linkage and clustered sub-models, which we call BRACS. The model is designed for combining two sets of data in which there are differences in the comparison distributions for links and non-links, conditional on attributes observed in one of the files. We use simulation studies to demonstrate that the proposed approach can yield improvements in classifying record pairs as links versus non-links.

In Chapter 4, we apply BRACS to 2016 voting data from North Carolina. We describe the process of provisional voting and the list of provisional voters provided by the North Carolina Board of Elections. We provide background on the North Carolina voter file, of which we use a snapshot from November 2016. We outline the limitations of exact-matching the two files using only the state-provided identifiers. Finally, we use BRACS to link the two files, with and without the state-provided identifiers, in order to estimate the number of removed voters who cast provisional ballots in the November 2016 election in North Carolina.

In Chapter 5, we modify BRACS to relax the assumption of conditionally independent field comparisons, motivated by the correlation between party registration and race in North Carolina. We outline a method for accounting for this correlation, in which we combine two dependent comparison fields into one joint comparison field. We use simulation studies to demonstrate that this can yield improvements in linkage quality, and we also outline when it may not be appropriate to use. Finally, we apply the results to the data in Chapter 4 and re-estimate the number of removed voters with the joint-comparison BRACS.

Type

Dissertation

Department

Statistical Science

Subjects

Statistics, Entity Resolution, File Linkage, Matching, Propensity score, Record Linkage, Subclassification

Permalink

https://hdl.handle.net/10161/18657

Citation

Wortman, Joan Pearson Heck (2019). Record Linkage Methods with Applications to Causal Inference and Election Voting Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/18657.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Record Linkage Methods with Applications to Causal Inference and Election Voting Data

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections