Show simple item record

A Bayesian Approach to Graphical Record Linkage and Deduplication

dc.contributor.author Steorts, RC
dc.contributor.author Hall, R
dc.contributor.author Fienberg, SE
dc.date.accessioned 2016-04-14T20:02:46Z
dc.date.issued 2016-10-01
dc.identifier.issn 0162-1459
dc.identifier.uri https://hdl.handle.net/10161/11817
dc.description.abstract © 2016 American Statistical Association.We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online.
dc.publisher Informa UK Limited
dc.relation.ispartof Journal of the American Statistical Association
dc.relation.isversionof 10.1080/01621459.2015.1105807
dc.title A Bayesian Approach to Graphical Record Linkage and Deduplication
dc.type Journal article
duke.contributor.id Steorts, RC|0682018
pubs.begin-page 1660
pubs.end-page 1672
pubs.issue 516
pubs.organisational-group Basic Science Departments
pubs.organisational-group Biostatistics & Bioinformatics
pubs.organisational-group Computer Science
pubs.organisational-group Duke
pubs.organisational-group School of Medicine
pubs.organisational-group Statistical Science
pubs.organisational-group Trinity College of Arts & Sciences
pubs.publication-status Published
pubs.volume 111
dc.identifier.eissn 1537-274X


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record