SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication
Repository Usage Stats
180
views
views
137
downloads
downloads
Abstract
We propose a novel unsupervised approach for linking records across arbitrarily many
files, while simultaneously detecting duplicate records within files. Our key innovation
is to represent the pattern of links between records as a {\em bipartite} graph, in
which records are directly linked to latent true individuals, and only indirectly
linked to other records. This flexible new representation of the linkage structure
naturally allows us to estimate the attributes of the unique observable people in
the population, calculate $k$-way posterior probabilities of matches across records,
and propagate the uncertainty of record linkage into later analyses. Our linkage structure
lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm,
which overcomes many obstacles encountered by previously proposed methods of record
linkage, despite the high dimensional parameter space. We assess our results on real
and simulated data.
Type
Journal articlePermalink
https://hdl.handle.net/10161/11815Collections
More Info
Show full item recordScholars@Duke
Rebecca Carter Steorts
Assistant Professor of Statistical Science
You can find more information about my research group and work at:https://resteorts.github.io/Recent
papers of mine can be found at https://arxiv.org/search/?query=steorts&searchtype=all&source=header

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles