Simultaneous Edit and Imputation For Household Data with Structural Zeros

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

101
views
30
downloads

Citation Stats

Attention Stats

Abstract

Multivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.

Department

Description

Provenance

Citation

Published Version (Please cite this version)

10.1093/jssam/smy022

Publication Info

Akande, O, Andres Barrientos and Jerome Reiter (n.d.). Simultaneous Edit and Imputation For Household Data with Structural Zeros. Journal of Survey Statistics and Methodology. 10.1093/jssam/smy022 Retrieved from https://hdl.handle.net/10161/17928.

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.

Scholars@Duke

Reiter

Jerome P. Reiter

Professor of Statistical Science

My primary areas of research include methods for preserving data confidentiality, for handling missing values, for integrating information across multiple sources, and for the analysis of surveys and causal studies. I enjoy collaborating on data analyses with researchers who are not statisticians, particularly in the social sciences and public policy.


Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.