Skip to main content
Duke University Libraries
DukeSpace Scholarship by Duke Authors
  • Login
  • Ask
  • Menu
  • Login
  • Ask a Librarian
  • Search & Find
  • Using the Library
  • Research Support
  • Course Support
  • Libraries
  • About
View Item 
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Nonparametric Bayesian Methods for Multiple Imputation of Large Scale Incomplete Categorical Data in Panel Studies

Thumbnail
View / Download
1.1 Mb
Date
2012
Author
Si, Yajuan
Advisor
Reiter, Jerome P
Repository Usage Stats
518
views
1,098
downloads
Abstract

The thesis develops nonparametric Bayesian models to handle incomplete categorical variables in data sets with high dimension using the framework of multiple imputation. It presents methods for ignorable missing data in cross-sectional studies, and potentially non-ignorable missing data in panel studies with refreshment samples.

The first contribution is a fully Bayesian, joint modeling approach of multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions. The approach automatically models complex dependencies while being computationally expedient.

I illustrate repeated sampling properties of the approach

using simulated data. This approach offers better performance than default chained equations methods, which are often used in such settings. I apply the methodology to impute missing background data in the 2007 Trends in International Mathematics and Science Study.

For the second contribution, I extend the nonparametric Bayesian imputation engine to consider a mix of potentially non-ignorable attrition and ignorable item nonresponse in multiple wave panel studies. Ignoring the attrition in models for panel data can result in biased inference if the reason for attrition is systematic and related to the missing values. Panel data alone cannot estimate the attrition effect without untestable assumptions about the missing data mechanism. Refreshment samples offer an extra data source that can be utilized to estimate the attrition effect while reducing reliance on strong assumptions of the missing data mechanism.

I consider two novel Bayesian approaches to handle the attrition and item non-response simultaneously under multiple imputation in a two wave panel with one refreshment sample when the variables involved are categorical and high dimensional.

First, I present a semi-parametric selection model that includes an additive non-ignorable attrition model with main effects of all variables, including demographic variables and outcome measures in wave 1 and wave 2. The survey variables are modeled jointly using Bayesian mixture of multinomial distributions. I develop the posterior computation algorithms for the semi-parametric selection model under different prior choices for the regression coefficients in the attrition model.

Second, I propose two Bayesian pattern mixture models for this scenario that use latent classes to model the dependency among the variables and the attrition. I develop a dependent Bayesian latent pattern mixture model for which variables are modeled via latent classes and attrition is treated as a covariate in the class allocation weights. And, I develop a joint Bayesian latent pattern mixture model, for which attrition and variables are modeled jointly via latent classes.

I show via simulation studies that the pattern mixture models can recover true parameter estimates, even when inferences based on the panel alone are biased from attrition.

I apply both the selection and pattern mixture models to data from the 2007-2008 Associated Press/Yahoo News election panel study.

Type
Dissertation
Department
Statistical Science
Subject
Statistics
Categorical
Large scale
Multiple imputation
Nonparametric Bayes
Panel
Refreshment sample
Permalink
https://hdl.handle.net/10161/5837
Citation
Si, Yajuan (2012). Nonparametric Bayesian Methods for Multiple Imputation of Large Scale Incomplete Categorical Data in Panel Studies. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/5837.
Collections
  • Duke Dissertations
More Info
Show full item record
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.

Rights for Collection: Duke Dissertations


Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info

Make Your Work Available Here

How to Deposit

Browse

All of DukeSpaceCommunities & CollectionsAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit DateThis CollectionAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit Date

My Account

LoginRegister

Statistics

View Usage Statistics
Duke University Libraries

Contact Us

411 Chapel Drive
Durham, NC 27708
(919) 660-5870
Perkins Library Service Desk

Digital Repositories at Duke

  • Report a problem with the repositories
  • About digital repositories at Duke
  • Accessibility Policy
  • Deaccession and DMCA Takedown Policy

TwitterFacebookYouTubeFlickrInstagramBlogs

Sign Up for Our Newsletter
  • Re-use & Attribution / Privacy
  • Harmful Language Statement
  • Support the Libraries
Duke University