Answering and Explaining SQL Queries Privately

dc.contributor.advisor

Machanavajjhala, Ashwin

dc.contributor.author

Tao, Yuchao

dc.date.accessioned

2022-09-21T13:54:45Z

dc.date.available

2023-09-16T08:17:16Z

dc.date.issued

2022

dc.department

Computer Science

dc.description.abstract

Data privacy has been receiving an increasing amount of attention in recent years. While large-scale personal information is collected for scientific research and commercial products, a privacy breach is not acceptable as a trade-off. In the last decade, differential privacy has become a gold standard to protect data privacy and has been applied in many organizations. Past work focused on building a differentially private SQL query answering system as a building block for wider applications. However, answering counting queries with joins under differential privacy appears as a challenge. The join operator allows any user to have an unbounded impact on the query result, which impedes hiding the existence of a single user by differential privacy. On the other hand, the introduction of differential privacy to the query answering also prevents the users from understanding the query results correctly, since she needs to distinguish the effect of differential privacy from the contribution of data.

In this thesis, we study two problems about answering and explaining SQL queries privately. First, we present efficient algorithms to compute local sensitivities of counting queries with joins, which is an important premise for answering these queries under differential privacy. We track the sensitivity contributed by each tuple, based on which we propose a truncation mechanism that answers counting queries with joins privately with high utility. Second, we propose a formal framework DPXPlain, a three-phase framework that allows users to get explanations for group-by COUNT/SUM/AVG query results while preserving DP. We utilize confidence intervals to help users understand the uncertainty in the query results introduced by differential privacy, and further provide top-k explanations under differential privacy to explain the contribution of data to the results.

dc.identifier.uri

https://hdl.handle.net/10161/25782

dc.subject

Computer science

dc.subject

Database

dc.subject

Differential privacy

dc.subject

Explanation

dc.subject

SQL

dc.title

Answering and Explaining SQL Queries Privately

dc.type

Dissertation

duke.embargo.months

11.835616438356164

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tao_duke_0066D_16905.pdf
Size:
1.54 MB
Format:
Adobe Portable Document Format

Collections