Query Answering in Multi-Relational Databases Under Differential Privacy

Loading...
Thumbnail Image

Date

2019

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

293
views
266
downloads

Abstract

Data collection has become a staple of both our digital and ``off-line'' activities. Government agencies, medical institutions, Internet companies, and academic institutions are among the main actors that collect and store users' data. Analysis and sharing of this data is paramount in our increasingly data-driven world.

Data sharing provides a large positive societal value; however, it does not come cost-free: data sharing is at fundamental odds with individuals' privacy.

As a result, data privacy has become a major research area, with differential privacy emerging as the de facto data privacy framework.

To mask the presence of any individual in the database, differentially private algorithms usually add noise to data releases. This noise is calibrated by the so called "privacy budget", a parameter that quantifies the privacy loss allowed.

One major shortcoming of both the definition and the supporting literature is that it applies to flat tables and extensions for multi-relational schemas are non trivial. In the context of multi-relational schemas, the privacy semantics are not well defined since individuals might be affecting multiple relations each of which in a different degree.

Moreover, there is no system that permits accurate differentially private answering of SQL queries while imposing a fixed privacy loss across all queries posed by the analyst.

In this thesis, we present PrivSQL, a first of its kind end-to-end differentially private relational database system. PrivSQL allows analysts to query a standard relational database using a rich class of SQL queries. PrivSQL enables the data owner to flexibly define the privacy semantics over the schema and provides a fixed privacy loss across all queries submitted by the analyst.

PrivSQL works by carefully selecting a set of views over the database schema, generating a set of private synopses over those views, and lastly answering incoming analyst queries based on the synopses.

Additionally, PrivSQL employs a variety of novel techniques like view selection for differential privacy, policy-aware view rewriting, and view truncation. These techniques allow PrivSQL to offer automatic support for custom-tailored privacy semantics and permit low error in query answering.

Description

Provenance

Citation

Citation

Kotsogiannis, Ios (2019). Query Answering in Multi-Relational Databases Under Differential Privacy. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/19877.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.