Browsing by Department "Statistics and Decision Sciences"
Results Per Page
Sort Options
Item Open Access Conditions for Rapid and Torpid Mixing of Parallel and Simulated Tempering on Multimodal Distributions(2007-09-14) Woodard, Dawn BanisterStochastic sampling methods are ubiquitous in statistical mechanics, Bayesian statistics, and theoretical computer science. However, when the distribution that is being sampled is multimodal, many of these techniques converge slowly, so that a great deal of computing time is necessary to obtain reliable answers. Parallel and simulated tempering are sampling methods that are designed to converge quickly even for multimodal distributions. In this thesis, we assess the extent to which this goal is acheived.We give conditions under which a Markov chain constructed via parallel or simulated tempering is guaranteed to be rapidly mixing, meaning that it converges quickly. These conditions are applicable to a wide range of multimodal distributions arising in Bayesian statistical inference and statistical mechanics. We provide lower bounds on the spectral gaps of parallel and simulated tempering. These bounds imply a single set of sufficient conditions for rapid mixing of both techniques. A direct consequence of our results is rapid mixing of parallel and simulated tempering for several normal mixture models in R^M as M increases, and for the mean-field Ising model.We also obtain upper bounds on the convergence rates of parallel and simulated tempering, yielding a single set of sufficient conditions for torpid mixing of both techniques. These conditions imply torpid mixing of parallel and simulated tempering on a normal mixture model with unequal covariances in $\R^M$ as $M$ increases and on the mean-field Potts model with $q \geq 3$, regardless of the number and choice of temperatures, as well as on the mean-field Ising model if an insufficient (fixed) set of temperatures is used. The latter result is in contrast to the rapid mixing of parallel and simulated tempering on the mean-field Ising model with a linearly increasing set of temperatures.Item Open Access Model Selection and Multivariate Inference Using Data Multiply Imputed for Disclosure Limitation and Nonresponse(2007-12-07) Kinney, Satkartar KThis thesis proposes some inferential methods for use with multiple imputation for missing data and statistical disclosure limitation, and describes an application of multiple imputation to protect data confidentiality. A third component concerns model selection in random effects models.The use of multiple imputation to generate partially synthetic public release files for confidential datasets has the potential to limit unauthorized disclosure while allowing valid inferences to be made. When confidential datasets contain missing values, it is natural to use multiple imputation to handle the missing data simultaneously with the generation of synthetic data. This is done in a two-stage process so that the variability may be estimated properly. The combining rules for data multiply imputed in this fashion differ from those developed for multiple imputation in a single stage. Combining rules for scalar estimands have been derived previously; here hypothesis tests for multivariate components are derived. Longitudinal business data are widely desired by researchers, but difficult to make available to the public because of confidentiality constraints. An application of partially synthetic data to the U. S. Census Longitudinal Business Database is described. This is a large complex economic census for which nearly the entire database must be imputed in order for it to be considered for public release. The methods used are described and analytical results for synthetic data generated for a subgroup are described. Modifications to the multiple imputation combining rules for population data are also developed.Model selection is an area in which few methods have been developed for use with multiply-imputed data. Careful consideration is given to how Bayesian model selection can be conducted with multiply-imputed data. The usual assumption of correspondence between the imputation and analyst models is not amenable to model selection procedures. Hence, the model selection procedure developed incorporates the imputation model and assumes that the imputation model is known to the analyst.Lastly, a model selection problem outside the multiple imputation context is addressed. A fully Bayesian approach for selecting fixed and random effects in linear and logistic models is developed utilizing a parameter expanded stochastic search Gibbs sampling algorithm to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process.Item Open Access Using Data Augmentation and Stochastic Differential Equations in Spatio Temporal Modeling(2008-12-12) Puggioni, GavinoOne of the biggest challenges in spatiotemporal modeling is indeed how to manage the large amount of missing information. Data augmentation techniques are frequently used to infer about missing values, unobserved or latent processes, approximation of continuous time processes that are discretely observed.
The literature treating the inference when modeling using stochastic differential equations (SDE) that are partially observed has been growing in recent years. Many attempts have been made to tackle this problem, from very different perspectives. The goal of this thesis is not a comparison of the different methods. The focus is, instead, on Bayesian inference for the SDE in a spatial context, using a data augmentation approach. While other methods can be less computationally intensive or more accurate in some cases, the main advantage of the Bayesian approach based on model augmentation is the general scope of applicability. In Chapter 2 we propose some methods to model space time data as noisy realizations of an underlying system of nonlinear SDEs. The parameters of this system are realizations of spatially correlated Gaussian processes. Models that are formulated in this fashion are complex and present several challenges in their estimation. Standard methods degenerate when the the level of refinement in the discretization gets larger. The innovation algorithm overcomes such problems. We present an extension of the innvoation scheme for the case of high-dimensional parameter spaces. Our algorithm, although presented in spatial SDE examples, can be actually applied in any general multivariate SDE setting.
In Chapter 3 we discuss additional insights regarding SDE with a spatial interpretation: spatial dependence is enforced through the driving Brownian motion.
In Chapter 4 we discuss some possible refinement on the SDE parameter estimation. Such refinements, that involve second order SDE approximations, have actually a more general scope than spatiotemporal modeling and can be applied in a variety of settings.
In the last chapter we propose some methodology ideas for fitting space-time models to data that are collected in a wireless sensor network when suppression and failure in transmission are considered. In this case also we make use of data augmentation techniques but in conjunction with linear constraints on the missing values.