Bayesian Models for Imputing Missing Data and Editing Erroneous Responses in Surveys

dc.contributor.advisor

Reiter, Jerome P

dc.contributor.author

Akande, Olanrewaju Michael

dc.date.accessioned

2019-06-07T19:49:04Z

dc.date.available

2019-06-07T19:49:04Z

dc.date.issued

2019

dc.department

Statistical Science

dc.description.abstract

This thesis develops Bayesian methods for handling unit nonresponse, item nonresponse, and erroneous responses in large scale surveys and censuses containing categorical data. I focus on applications to nested household data where individuals are nested within households and certain combinations of the variables are not allowed, such as the U.S. Decennial Census, as well as surveys subject to both unit and item nonresponse, such as the Current Population Survey.

The first contribution is a Bayesian model for imputing plausible values for item nonresponse in data nested within households, in the presence of impossible combinations. The imputation is done using a nested data Dirichlet process mixture of products of multinomial distributions model, truncated so that impossible household configurations have zero probability in the model. I show how to generate imputations from the Markov Chain Monte Carlo sampler, and describe strategies for improving the computational efficiency of the model estimation. I illustrate the performance of the approach with data that mimic the variables collected in the U.S. Decennial Census. The results indicate that my approach can generate high quality imputations in such nested data.

The second contribution extends the imputation engine in the first contribution to allow for the editing and imputation of household data containing faulty values. The approach relies on a Bayesian hierarchical model that uses the nested data Dirichlet process mixture of products of multinomial distributions as a model for the true unobserved data, but also includes a model for the location of errors, and a reporting model for the observed responses in error. I illustrate the performance of the edit and imputation engine using data from the 2012 American Community Survey. I show that my approach can simultaneously estimate multivariate relationships in the data accurately, adjust for measurement errors, and respect impossible combinations in estimation and imputation.

The third contribution is a framework for using auxiliary information to specify nonignorable models that can handle both item and unit nonresponse simultaneously. My approach focuses on how to leverage auxiliary information from external data sources in nonresponse adjustments. This method is developed for specifying imputation models so that users can posit distinct specifications of missingness mechanisms for different blocks of variables, for example, a nonignorable model for variables with auxiliary marginal information and an ignorable model for the variables exclusive to the survey.

I illustrate the framework using data on voter turnout in the Current Population Survey.

The final contribution extends the framework in the third contribution to complex surveys, specifically, handling nonresponse in complex surveys, such that we can still leverage auxiliary data while respecting the survey design through survey weights. Using several simulations, I illustrate the performance of my approach when the sample is generated primarily through stratified sampling.

dc.identifier.uri

https://hdl.handle.net/10161/18766

dc.subject

Statistics

dc.subject

Census

dc.subject

Measurement error

dc.subject

Missing data

dc.subject

Multiple imputation

dc.subject

Survey Nonresponse

dc.subject

Survey Weights

dc.title

Bayesian Models for Imputing Missing Data and Editing Erroneous Responses in Surveys

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Akande_duke_0066D_15115.pdf
Size:
1.39 MB
Format:
Adobe Portable Document Format

Collections