All-Near-Neighbor Search Among High-Dimensional Data via Hierarchical Sparse Graph Code Filtering

Loading...
Thumbnail Image

Date

2020

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

350
views
297
downloads

Abstract

This thesis addresses the problem of all-to-all near-neighbor (all-NN) search among a large dataset of discrete points in a high-dimensional feature space endowed with a distance metric. Near-neighbor search is fundamental to numerous computational tasks arising in modern data analysis, modeling, and knowledge discovery. The accuracy and efficiency of near-neighbor search depends on the underlying structure of the dataset. In emerging data-driven analysis applications, the dataset is not necessarily stationary, the structure is not necessarily built once for all queries, and search is not necessarily limited to a few query points. To facilitate accurate and efficient near-neighbor search in a stationary or dynamic dataset, we make a systematic investigation of all-NN search at multiple resolution levels, with attention to several standing or previously overlooked issues associated with high-dimensional feature spaces.

The key contributions are outlined as follows. (i) We proclaim the inherent sparse distribution of a finite discrete dataset in a high-dimensional space. We demonstrate that exploiting this sparsity is pivotal to dispelling the so-called curse of dimensionality in theoretical analysis and practical algorithm design. (ii) We introduce the adjacency graph hierarchy (AGH) as an analysis apparatus as well as base infrastructure for efficient all-NN search algorithm design. (iii) We present an efficient AGH construction algorithm by directly and deterministically pinpointing adjacent nodes at each resolution level via sparse adjacency code filtering, in lieu of exhaustive search within local node neighborhoods. We provide at the same time a statistical estimate of the significant gap in algorithm complexity between direct location of adjacent nodes and local search. (iv) We introduce an adjacency-sensitive, graph-based but decentralized approach for hierarchical partition of a discrete dataset, with potential advantage for accurate and efficient all-NN search in comparison to existing non-interacting partition structures.

Description

Provenance

Citation

Citation

Iliopoulos, Alexandros Stavros (2020). All-Near-Neighbor Search Among High-Dimensional Data via Hierarchical Sparse Graph Code Filtering. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/21017.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.