Durability Queries on Temporal Data
Date
2020
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Temporal data is ubiquitous in our everyday life, but tends to be noisy and often exhibits transient patterns. To make better decisions with data, we must avoid jumping to conclusions based on certain particular query results or observations. Instead, a useful perspective is to consider "durability", or, intuitively speaking, finding results that are robust and stand "the test of time". This thesis studies durability queries on temporal data that return durable results efficiently and effectively.
The focus of this thesis is two-fold: (1) design meaningful and practical notions of durability (and corresponding queries) on different types of temporal data, and (2) develop efficient techniques for durability query processing.
We first study sequence-based temporal datasets where each temporal object has a series of values indexed by time.
Durability queries ask for objects whose (snapshot) values were among the top $k$ for at least some fraction of the times during a given time interval; e.g., "from 2013 to 2016, United Airlines has the highest stock price among American-based airline companies for at least 80\% of the time."
Second, we consider instant-stamped temporal datasets where each data record is stamped by a time instant.
Here, durability queries look for records that stand out among nearby records (defined by a time window) and retain their supremacy for a long period of time; e.g. "On January 22, 2006, Kobe Bryant dropped 81 points against Toronto Raptors, a scoring record that since then has yet to be broken."
Finally, going beyond analyzing historical data, we investigate the notation of durability into the future, where durability needs to be predicted by performing stochastic simulation of temporal models.
For answering durability queries across these problem settings, we apply principled approaches to design fast, scalable algorithms and indexing methods.
Our solutions broadly combine geometric, statistical, and approximate query processing techniques to provide a meaningful balance between query efficiency and result quality, along with theoretical worst-case (or average-case) guarantees.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Gao, Junyang (2020). Durability Queries on Temporal Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/21471.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.