Separating Features from Noise with Persistence and Statistics
Date
2010
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
In this thesis, we explore techniques in statistics and persistent homology, which detect features among data sets such as graphs, triangulations and point cloud. We accompany our theorems with algorithms and experiments, to demonstrate their effectiveness in practice.
We start with the derivation of graph scan statistics, a measure useful to assess the statistical significance of a subgraph in terms of edge density. We cluster graphs into densely-connected subgraphs based on this measure. We give algorithms for finding such clusterings and experiment on real-world data.
We next study statistics on persistence, for piecewise-linear functions defined on the triangulations of topological spaces. We derive persistence pairing probabilities among vertices in the triangulation. We also provide upper bounds for total persistence in expectation.
We continue by examining the elevation function defined on the triangulation of a surface. Its local maxima obtained by persistence pairing are useful in describing features of the triangulations of protein surfaces. We describe an algorithm to compute these local maxima, with a run-time ten-thousand times faster in practice than previous method. We connect such improvement with the total Gaussian curvature of the surfaces.
Finally, we study a stratification learning problem: given a point cloud sampled from a stratified space, which points belong to the same strata, at a given scale level? We assess the local structure of a point in relation to its neighbors using kernel and cokernel persistent homology. We prove the effectiveness of such assessment through several inference theorems, under the assumption of dense sample. The topological inference theorem relates the sample density with the homological feature size. The probabilistic inference theorem provides sample estimates to assess the local structure with confidence. We describe an algorithm that computes the kernel and cokernel persistence diagrams and prove its correctness. We further experiment on simple synthetic data.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Wang, Bei (2010). Separating Features from Noise with Persistence and Statistics. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/2982.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.