Browsing by Subject "Missing"
- Results Per Page
- Sort Options
Item Open Access An Empirical Comparison of Multiple Imputation Methods for Categorical Data(The American Statistician, 2017-04-03) Akande, O; Li, F; Reiter, J© 2017 American Statistical Association. Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.Item Open Access Multiple Imputation of Missing Covariates in Randomized Controlled Trials(2019) Kamat, GauriBaseline covariates in randomized experiments play a crucial role in the estimation of treatment effects. Random assignment ensures independence of the covariates and the treatment, which is essential for objective interpretation of the effects. Covariates are also measured before observing the outcome, which guarantees validity of any causal conclusions obtained. When covariates are partly missing, it may be essential to consider if imputations should be carried out respecting randomization or separately by treatment groups. One other question that could arise is if outcome information should be incorporated in the imputations. In view of these considerations, we examine four different ways of multiply imputing missing baseline data in randomized trials. We consider imputation in the design and outcome stages of a randomized experiment, taking into account, the independence restrictions implied by randomization. We allow for the possibility of non-ignorable missingness, and use identifying restrictions in the nonparametric saturated class to obtain the full data density from the observed data distribution. We further conduct repeated sampling studies to assess the performance of the methods in three different missingness scenarios that could commonly emerge in randomized trials.
Item Open Access Simultaneous Edit and Imputation for Household Data with Structural Zeros(Journal of Survey Statistics and Methodology) Akande, Olanrewaju; Barrientos, Andres; Reiter, JeromeMultivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.