SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication

dc.contributor.author

Steorts, RC

dc.contributor.author

Hall, R

dc.contributor.author

Fienberg, SE

dc.date.accessioned

2016-04-14T19:54:17Z

dc.description.abstract

We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate $k$-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data.

dc.identifier

http://arxiv.org/abs/1403.0211v1

dc.identifier.uri

https://hdl.handle.net/10161/11815

dc.subject

stat.CO

dc.subject

stat.CO

dc.subject

stat.AP

dc.title

SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication

dc.type

Journal article

pubs.author-url

http://arxiv.org/abs/1403.0211v1

pubs.notes

AISTATS (2014), to appear; 9 pages with references, 2 page supplement, 4 figures. Shorter version of arXiv:1312.4645

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Computer Science

pubs.organisational-group

Duke

pubs.organisational-group

School of Medicine

pubs.organisational-group

Statistical Science

pubs.organisational-group

Trinity College of Arts & Sciences

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1403.0211v1.pdf
Size:
415.16 KB
Format:
Adobe Portable Document Format
Description:
Published version