Succinct Data Structures
Abstract
The world is drowning in data. The recent explosion of web publishing,
XML data, bioinformation, scientific data, image data, geographical map
data, and even email communications has put a strain on our ability to
manage the information contained there. In general, the influx of
massive data sets for all kinds of data present a number of difficulties
with storage, organization of information, and data accessibility. A
primary computing challenge in these cases is how to compress the data
but still allow it to be queried quickly.In real-life situations, many instances of
data are highly compressible, presenting a major opportunity for space savings. In
mobile applications, such savings are critical, since space and the power to
access information are at a premium. In a streaming environment, where
new data are being generated constantly, compression can aid in prediction as well.
In the case of bioinformatics, understanding
succinct representations of DNA sequences could lead to a more
fundamental understanding of the nature of our own "data stream,"
perhaps even giving hints on secondary and tertiary structure, gene
evolution, and other important topics.In this thesis, we focus our attention on the
important problem of
compressed text indexing<\i>, where the goal is to compress a text
document and allow arbitrary searching for patterns in the best possible
time <i>without first decompressing the text<\i>. We develop a number of
compressed data structures that either solve this problem directly, or
are used as smaller components of an overall text indexing solution. Each component
has a number of applications beyond text indexing as well. For each structure, we
provide a theoretical study of its space usage and query
performance on a suite of operations crucial to access the stored data.
In each case, we relate its space usage to the <i>compressed size of
the original data and show that the supported operations function in
near-optimal or optimal time. We also present a number of experimental
results that validate our theoretical findings, showing that our methodology is competitive
with the state-of-the-art.
Type
DissertationDepartment
Computer SciencePermalink
http://hdl.handle.net/10161/434Citation
Gupta, Ankur (2007). Succinct Data Structures. Dissertation, Duke University. Retrieved from http://hdl.handle.net/10161/434.Collections
More Info
Show full item record
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations