Graphical and Isoperimetric Perspectives on Sampling and Regularization
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Several recurring themes in modern Bayesian statistics and probabilistic machine learning are sampling, regularization and embeddings/latent variables. This thesis examines particular aspects of these themes from the perspective of concentration of measure, isoperimetry and spectral graph theory. These mathematical topics are intricately connected, and we often leverage related tools, such as Laplacians and concentration inequalities, in the theory and methods developed in this thesis.
From the sampling methodology end, we develop a novel spanning tree sampler called the fast-forwarded cover algorithm that improves upon the celebrated Aldous-Broder algorithm. The Aldous-Broder algorithm, and other related random-walk based approaches for spanning tree sampling, can be stuck when the underlying graph exhibits bottleneck structures. We show that our fast-forwarded cover algorithm can break free of bottlenecks that are arbitrarily small, while maintaining exact sampling guarantees. We demonstrate how our novel sampler can be applied to posterior sampling from Bayesian models with tree components.
For sampling theory, we investigate the statistical capacity of broad classes of deep generative models. We use concentration of measure techniques to show that there is a limitation to what kinds of distributions deep generative models can learn to sample from. In particular, for generative adversarial networks and variational autoencoders, under standard priors on the latent variables, only light tailed distributions can be learned. This fact is generalized to the manifold setting using tools from concentration and geometry. We also provide extra quantification of these models' capacities based on functional inequalities. This debunks a popular belief about the learning capacity of deep generative models.
On the regularization side, we propose Fiedler regularization, a way to use a neural network's underlying graph structure to regularize itself. It can be used in the supervised learning setting, as well as more general settings such as for regularizing autoencoders. We give theoretial guarantees and empirical experimental results.
Last but not least, on the embeddings front, we propose a novel piece of methodology for graph comparison called embedded Laplacian distance. Typical methods for comparing graphs are computationally demanding, due to the discrete and combinatorial nature of graphs. We observe that for many applications where graph comparison is needed, an approximate comparison that captures the structures of the graphs suffice. We propose to compare graphs based on continuous representations that capture meaningful graph structure, namely spectral embeddings. Theoretical and empirical results are provided.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Tam, Edric (2024). Graphical and Isoperimetric Perspectives on Sampling and Regularization. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/31963.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.