Variational Bayes for Merging Noisy Databases

dc.contributor.author

Broderick, Tamara

dc.contributor.author

Steorts, Rebecca C

dc.date.accessioned

2016-04-14T19:50:17Z

dc.description.abstract

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power across databases as well as principled quantification of uncertainty for queries of the final, resolved database. However, existing Bayesian methods for entity resolution use Markov monte Carlo method (MCMC) approximations and are too slow to run on modern databases containing millions or billions of records. Instead, we propose applying variational approximations to allow scalable Bayesian inference in these models. We derive a coordinate-ascent approximation for mean-field variational Bayes, qualitatively compare our algorithm to existing methods, note unique challenges for inference that arise from the expected distribution of cluster sizes in entity resolution, and discuss directions for future work in this domain.

dc.format.extent

12 pages

dc.identifier

http://arxiv.org/abs/1410.4792v1

dc.identifier.uri

https://hdl.handle.net/10161/11814

dc.subject

stat.ME

dc.subject

stat.ME

dc.subject

stat.ML

dc.title

Variational Bayes for Merging Noisy Databases

dc.type

Journal article

pubs.author-url

http://arxiv.org/abs/1410.4792v1

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Computer Science

pubs.organisational-group

Duke

pubs.organisational-group

School of Medicine

pubs.organisational-group

Statistical Science

pubs.organisational-group

Trinity College of Arts & Sciences

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1410.4792v1.pdf
Size:
113.44 KB
Format:
Adobe Portable Document Format