Show simple item record

dc.contributor.advisor Reiter, Jerome P en_US
dc.contributor.author Kinney, Satkartar K en_US
dc.date.accessioned 2008-01-02T16:33:29Z
dc.date.available 2008-01-02T16:33:29Z
dc.date.issued 2007-12-07 en_US
dc.identifier.uri http://hdl.handle.net/10161/437
dc.description Dissertation en_US
dc.description.abstract This thesis proposes some inferential methods for use with multiple imputation for missing data and statistical disclosure limitation, and describes an application of multiple imputation to protect data confidentiality. A third component concerns model selection in random effects models.The use of multiple imputation to generate partially synthetic public release files for confidential datasets has the potential to limit unauthorized disclosure while allowing valid inferences to be made. When confidential datasets contain missing values, it is natural to use multiple imputation to handle the missing data simultaneously with the generation of synthetic data. This is done in a two-stage process so that the variability may be estimated properly. The combining rules for data multiply imputed in this fashion differ from those developed for multiple imputation in a single stage. Combining rules for scalar estimands have been derived previously; here hypothesis tests for multivariate components are derived. Longitudinal business data are widely desired by researchers, but difficult to make available to the public because of confidentiality constraints. An application of partially synthetic data to the U. S. Census Longitudinal Business Database is described. This is a large complex economic census for which nearly the entire database must be imputed in order for it to be considered for public release. The methods used are described and analytical results for synthetic data generated for a subgroup are described. Modifications to the multiple imputation combining rules for population data are also developed.Model selection is an area in which few methods have been developed for use with multiply-imputed data. Careful consideration is given to how Bayesian model selection can be conducted with multiply-imputed data. The usual assumption of correspondence between the imputation and analyst models is not amenable to model selection procedures. Hence, the model selection procedure developed incorporates the imputation model and assumes that the imputation model is known to the analyst.Lastly, a model selection problem outside the multiple imputation context is addressed. A fully Bayesian approach for selecting fixed and random effects in linear and logistic models is developed utilizing a parameter expanded stochastic search Gibbs sampling algorithm to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. en_US
dc.format.extent 3858359 bytes
dc.format.mimetype application/pdf
dc.language.iso en_US
dc.subject Statistics en_US
dc.title Model Selection and Multivariate Inference Using Data Multiply Imputed for Disclosure Limitation and Nonresponse en_US
dc.type Dissertation en_US
dc.department Statistics and Decision Sciences en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record