Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality
Abstract
To protect the confidentiality of survey respondents' identities and sensitive attributes,
statistical agencies can release data in which confidential values are replaced with
multiple imputations. These are called synthetic data. We propose a two-stage approach
to generating synthetic data that enables agencies to release different numbers of
imputations for different variables. Generation in two stages can reduce computational
burdens, decrease disclosure risk, and increase inferential accuracy relative to generation
in one stage. We present methods for obtaining inferences from such data. We describe
the application of two stage synthesis to creating a public use file for a German
business database.
Type
Journal articlePermalink
https://hdl.handle.net/10161/4624Collections
More Info
Show full item recordScholars@Duke
Jerome P. Reiter
Professor of Statistical Science
My primary areas of research include methods for preserving data confidentiality,
for handling missing values, for integrating information across multiple sources,
and for the analysis of surveys and causal studies. I enjoy collaborating on data
analyses with researchers who are not statisticians, particularly in the social sciences
and public policy.

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info