Simultaneous Edit and Imputation For Household Data with Structural Zeros
Repository Usage Stats
Multivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.
Published Version (Please cite this version)10.1093/jssam/smy022
Publication InfoBarrientos, Andres; Akande, Olanrewaju; & Reiter, Jerome P (n.d.). Simultaneous Edit and Imputation For Household Data with Structural Zeros. Journal of Survey Statistics and Methodology. 10.1093/jssam/smy022. Retrieved from https://hdl.handle.net/10161/17928.
This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.
More InfoShow full item record
Instructor in the Social Science Research Institute
I work on developing statistical methodology for handling missing and faulty data, with particular emphasis on applications that intersect with the social sciences. I am especially motivated to develop methods that can be readily applied by statistical agencies and data analysts. I completed my PhD in statistical science at Duke in 2019, under the supervision of Jerry Reiter. I obtained an MSc in Statistical and Economic modeling from Duke in 2015, and a BSc in Mathematics and Statistics from
Alphabetical list of authors with Scholars@Duke profiles.