Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

Reiter, JP; Drechsler, J

Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

View / Download167.88 KB

Date

2010-01-01

Authors

Reiter, JP

Drechsler, J

Repository Usage Stats

266
views

330
downloads

Abstract

To protect the confidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which confidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage approach to generating synthetic data that enables agencies to release different numbers of imputations for different variables. Generation in two stages can reduce computational burdens, decrease disclosure risk, and increase inferential accuracy relative to generation in one stage. We present methods for obtaining inferences from such data. We describe the application of two stage synthesis to creating a public use file for a German business database.

Type

Journal article

Permalink

https://hdl.handle.net/10161/4624

Collections

Scholarly Articles

Full item page

Scholars@Duke

Jerome P. Reiter

Professor of Statistical Science

My primary areas of research include methods for preserving data confidentiality, for handling missing values, for integrating information across multiple sources, and for the analysis of surveys and causal studies. I enjoy collaborating on data analyses with researchers who are not statisticians, particularly in the social sciences and public policy.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.

Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

Date

Authors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Collections

Scholars@Duke

Jerome P. Reiter