Duke University Libraries
View Item 
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
    • Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Succinct Data Structures

    Thumbnail
    View / Download
    1.2 Mb
    Date
    2007-12-14
    Author
    Gupta, Ankur
    Advisor
    Vitter, Jeffrey S
    Repository Usage Stats
    325
    views
    819
    downloads
    Abstract
    The world is drowning in data. The recent explosion of web publishing, XML data, bioinformation, scientific data, image data, geographical map data, and even email communications has put a strain on our ability to manage the information contained there. In general, the influx of massive data sets for all kinds of data present a number of difficulties with storage, organization of information, and data accessibility. A primary computing challenge in these cases is how to compress the data but still allow it to be queried quickly.In real-life situations, many instances of data are highly compressible, presenting a major opportunity for space savings. In mobile applications, such savings are critical, since space and the power to access information are at a premium. In a streaming environment, where new data are being generated constantly, compression can aid in prediction as well. In the case of bioinformatics, understanding succinct representations of DNA sequences could lead to a more fundamental understanding of the nature of our own "data stream," perhaps even giving hints on secondary and tertiary structure, gene evolution, and other important topics.In this thesis, we focus our attention on the important problem of compressed text indexing<\i>, where the goal is to compress a text document and allow arbitrary searching for patterns in the best possible time <i>without first decompressing the text<\i>. We develop a number of compressed data structures that either solve this problem directly, or are used as smaller components of an overall text indexing solution. Each component has a number of applications beyond text indexing as well. For each structure, we provide a theoretical study of its space usage and query performance on a suite of operations crucial to access the stored data. In each case, we relate its space usage to the <i>compressed size of the original data and show that the supported operations function in near-optimal or optimal time. We also present a number of experimental results that validate our theoretical findings, showing that our methodology is competitive with the state-of-the-art.
    Type
    Dissertation
    Department
    Computer Science
    Subject
    Computer Science
    compression
    text indexing
    dictionaries
    dynamic
    optimal
    Permalink
    http://hdl.handle.net/10161/434
    Citation
    Gupta, Ankur (2007). Succinct Data Structures. Dissertation, Duke University. Retrieved from http://hdl.handle.net/10161/434.
    Collections
    • Duke Dissertations
    More Info
    Show full item record
    Creative Commons License
    This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.

    Rights for Collection: Duke Dissertations

     

     

    Browse

    All of DukeSpaceCommunities & CollectionsAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit DateThis CollectionAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit Date

    My Account

    LoginRegister

    Statistics

    View Usage Statistics