Browsing by Subject "stat.CO"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Open Access Complexity of zigzag sampling algorithm for strongly log-concave distributionsLu, Jianfeng; Wang, LihanWe study the computational complexity of zigzag sampling algorithm for strongly log-concave distributions. The zigzag process has the advantage of not requiring time discretization for implementation, and that each proposed bouncing event requires only one evaluation of partial derivative of the potential, while its convergence rate is dimension independent. Using these properties, we prove that the zigzag sampling algorithm achieves $\varepsilon$ error in chi-square divergence with a computational cost equivalent to $O\bigl(\kappa^2 d^\frac{1}{2}(\log\frac{1}{\varepsilon})^{\frac{3}{2}}\bigr)$ gradient evaluations in the regime $\kappa \ll \frac{d}{\log d}$ under a warm start assumption, where $\kappa$ is the condition number and $d$ is the dimension.Item Open Access Mathematically Quantifying Gerrymandering and the Non-responsiveness of the 2021 Georgia Congressional Districting Plan(2022-03-12) Zhao, Zhanzhan; Hettle, Cyrus; Gupta, Swati; Mattingly, Jonathan; Randall, Dana; Herschlag, GregoryItem Open Access Methodological and computational aspects of parallel tempering methods in the infinite swapping limit(2018-02-14) Lu, J; Vanden-Eijnden, EA variant of the parallel tempering method is proposed in terms of a stochastic switching process for the coupled dynamics of replica configuration and temperature permutation. This formulation is shown to facilitate the analysis of the convergence properties of parallel tempering by large deviation theory, which indicates that the method should be operated in the infinite swapping limit to maximize sampling efficiency. The effective equation for the replica alone that arises in this infinite swapping limit simply involves replacing the original potential by a mixture potential. The analysis of the geometric properties of this potential offers a new perspective on the issues of how to choose of temperature ladder, and why many temperatures should typically be introduced to boost the sampling efficiency. It is also shown how to simulate the effective equation in this many temperature regime using multiscale integrators. Finally, similar ideas are also used to discuss extensions of the infinite swapping limits to the technique of simulated tempering.Item Open Access Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data SetMiller, Jeffrey; Betancourt, Brenda; Zaidi, Abbas; Wallach, Hanna; Steorts, Rebecca CMost generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some tasks, this assumption is undesirable. For example, when performing entity resolution, the size of each cluster is often unrelated to the size of the data set. Consequently, each cluster contains a negligible fraction of the total number of data points. Such tasks therefore require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the \emph{microclustering property} and introducing a new model that exhibits this property. We compare this model to several commonly used clustering models by checking model fit using real and simulated data sets.Item Open Access Non-reversible Markov chain Monte Carlo for sampling of districting mapsHerschlag, Gregory; Mattingly, Jonathan C; Sachs, Matthias; Wyse, EvanEvaluating the degree of partisan districting (Gerrymandering) in a statistical framework typically requires an ensemble of districting plans which are drawn from a prescribed probability distribution that adheres to a realistic and non-partisan criteria. In this article we introduce novel non-reversible Markov chain Monte-Carlo (MCMC) methods for the sampling of such districting plans which have improved mixing properties in comparison to previously used (reversible) MCMC algorithms. In doing so we extend the current framework for construction of non-reversible Markov chains on discrete sampling spaces by considering a generalization of skew detailed balance. We provide a detailed description of the proposed algorithms and evaluate their performance in numerical experiments.Item Open Access On explicit $L^2$-convergence rate estimate for piecewise deterministic Markov processesLu, Jianfeng; Wang, LihanWe establish $L^2$-exponential convergence rate for three popular piecewise deterministic Markov processes for sampling: the randomized Hamiltonian Monte Carlo method, the zigzag process, and the bouncy particle sampler. Our analysis is based on a variational framework for hypocoercivity, which combines a Poincar\'{e}-type inequality in time-augmented state space and a standard $L^2$ energy estimate. Our analysis provides explicit convergence rate estimates, which are more quantitative than existing results.Item Open Access SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplicationSteorts, RC; Hall, R; Fienberg, SEWe propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate $k$-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data.