ALERT: This system is being upgraded on Tuesday December 12. It will not be available for use for several hours that day while the upgrade is in progress. Deposits to DukeSpace will be disabled on Monday December 11, so no new items are to be added to the repository while the upgrade is in progress. Everything should be back to normal by the end of day, December 12.

Show simple item record

Efficient Algorithms for Querying Large and Uncertain Data

dc.contributor.advisor Agarwal, Pankaj Kumar Sintos, Stavros 2020-09-18T16:00:42Z 2020-09-18T16:00:42Z 2020
dc.description.abstract <p>Query processing is an important problem in many research fields including database systems, data mining, and geometric computing. The goal is to preprocess input data into an index to efficiently answer queries over data. In many real applications we need to handle a large amount of data or/and data that are ambiguous because of human errors and data integration. With the data sets becoming increasingly large and complex, queries are also becoming complex, therefore, new challenging problems have emerged in the area of query processing.</p><p>Data summarization helps to accelerate expensive data queries by running the query procedure in a small data summary rather than the entire large dataset.</p><p>The first part of this thesis studies the problem of finding small high quality data summaries in the Euclidean space.</p><p>Data summarization can be applied either in the entire dataset or among points in a query range given by the user.</p><p>Efficient algorithms are proposed to get a small summarization of the entire dataset so that Top-$k$ or user preference queries are answered efficiently with high quality guarantees by looking only in the data summary.</p><p>Such a summary can also be used in multi-criteria decision problems.</p><p>Furthermore, near-linear space indexes are designed so that given a query rectangle, a diverse high-valued summary of $k$ points in the query range is returned efficiently.</p><p>The second part of the thesis focuses on data queries over uncertain data.</p><p>Uncertainty is typically captured using stochastic data models, and querying data requires either statistics about the probabilistic behavior of the underlying data, or data cleaning to reduce the uncertainty of the query answer.</p><p>Small size indexes are built so that, given a query rectangle, statistics of the MAX (or top-$k$) operator among the points in the query range are computed in sublinear time.</p><p>In addition, approximation algorithms are proposed for selecting data to clean in order to minimize the variance of the result of a query function over a probabilistic database.</p><p>While most of the methods hold for general functions, the main focus is on minimizing the variance in fact checking operations.</p>
dc.subject Computer science
dc.subject data queries
dc.subject index
dc.subject summarization
dc.subject top-k
dc.subject uncertainty
dc.title Efficient Algorithms for Querying Large and Uncertain Data
dc.type Dissertation
dc.department Computer Science

Files in this item


This item appears in the following Collection(s)

Show simple item record