SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

237
views
160
downloads

Abstract

We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate $k$-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data.

Department

Description

Provenance

Citation

Scholars@Duke

Steorts

Rebecca Carter Steorts

Associate Professor of Statistical Science

You can find more information about my research group and work at:

https://resteorts.github.io/

Recent papers of mine can be found at 

https://arxiv.org/search/?query=steorts&searchtype=all&source=header


Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.