Variational Bayes for Merging Noisy Databases

Broderick, Tamara; Steorts, Rebecca C

Variational Bayes for Merging Noisy Databases

View / Download113.44 KB

Authors

Broderick, Tamara

Steorts, Rebecca C

Repository Usage Stats

187
views

112
downloads

Abstract

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power across databases as well as principled quantification of uncertainty for queries of the final, resolved database. However, existing Bayesian methods for entity resolution use Markov monte Carlo method (MCMC) approximations and are too slow to run on modern databases containing millions or billions of records. Instead, we propose applying variational approximations to allow scalable Bayesian inference in these models. We derive a coordinate-ascent approximation for mean-field variational Bayes, qualitatively compare our algorithm to existing methods, note unique challenges for inference that arise from the expected distribution of cluster sizes in entity resolution, and discuss directions for future work in this domain.

Type

Journal article

Subjects

stat.ME, stat.ME, stat.ML

Permalink

https://hdl.handle.net/10161/11814

Collections

Scholarly Articles

Full item page

Scholars@Duke

Rebecca Carter Steorts

Associate Professor of Statistical Science

You can find more information about my research group and work at:

https://resteorts.github.io/

Recent papers of mine can be found at

https://arxiv.org/search/?query=steorts&searchtype=all&source=header

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.

Variational Bayes for Merging Noisy Databases

Authors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Collections

Scholars@Duke

Rebecca Carter Steorts