All-Near-Neighbor Search Among High-Dimensional Data via Hierarchical Sparse Graph Code Filtering

dc.contributor.advisor

Sun, Xiaobai

dc.contributor.author

Iliopoulos, Alexandros Stavros

dc.date.accessioned

2020-06-09T17:59:58Z

dc.date.available

2020-11-27T09:17:16Z

dc.date.issued

2020

dc.department

Computer Science

dc.description.abstract

This thesis addresses the problem of all-to-all near-neighbor (all-NN) search among a large dataset of discrete points in a high-dimensional feature space endowed with a distance metric. Near-neighbor search is fundamental to numerous computational tasks arising in modern data analysis, modeling, and knowledge discovery. The accuracy and efficiency of near-neighbor search depends on the underlying structure of the dataset. In emerging data-driven analysis applications, the dataset is not necessarily stationary, the structure is not necessarily built once for all queries, and search is not necessarily limited to a few query points. To facilitate accurate and efficient near-neighbor search in a stationary or dynamic dataset, we make a systematic investigation of all-NN search at multiple resolution levels, with attention to several standing or previously overlooked issues associated with high-dimensional feature spaces.

The key contributions are outlined as follows. (i) We proclaim the inherent sparse distribution of a finite discrete dataset in a high-dimensional space. We demonstrate that exploiting this sparsity is pivotal to dispelling the so-called curse of dimensionality in theoretical analysis and practical algorithm design. (ii) We introduce the adjacency graph hierarchy (AGH) as an analysis apparatus as well as base infrastructure for efficient all-NN search algorithm design. (iii) We present an efficient AGH construction algorithm by directly and deterministically pinpointing adjacent nodes at each resolution level via sparse adjacency code filtering, in lieu of exhaustive search within local node neighborhoods. We provide at the same time a statistical estimate of the significant gap in algorithm complexity between direct location of adjacent nodes and local search. (iv) We introduce an adjacency-sensitive, graph-based but decentralized approach for hierarchical partition of a discrete dataset, with potential advantage for accurate and efficient all-NN search in comparison to existing non-interacting partition structures.

dc.identifier.uri

https://hdl.handle.net/10161/21017

dc.subject

Computer science

dc.subject

adjacency graph hierarchy

dc.subject

multi-scale interactions

dc.subject

near-neighbor search

dc.subject

spatial partitions

dc.title

All-Near-Neighbor Search Among High-Dimensional Data via Hierarchical Sparse Graph Code Filtering

dc.type

Dissertation

duke.embargo.months

5.621917808219178

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Iliopoulos_duke_0066D_15750.pdf
Size:
4.08 MB
Format:
Adobe Portable Document Format

Collections