Applying Differential Privacy with Sparse Vector Technique

Chen, Yan

Applying Differential Privacy with Sparse Vector Technique

View / Download3.34 MB

Date

2018

Authors

Chen, Yan

Advisors

Machanavajjhala, Ashwin

Repository Usage Stats

280
views

989
downloads

Abstract

In today's fast-paced developing digital world, a wide range of services such as web services, social networks, and mobile devices collect a large amount of

personal data from their users. Although sharing and mining large-scale personal data can help improve the functionality of

these services, it also raises privacy concerns for the individuals who contribute to the data.

Differential privacy has emerged as a de facto standard for analyzing sensitive data with strong provable privacy guarantees for individuals. There is a rich literature that

has led to the development of differentially private algorithms for numerous data analysis tasks. The privacy proof of these algorithms are mainly based on (a) the privacy guarantees

of a small number of primitives, and (b) a set of composition theorems that help reason about the privacy guarantee of algorithms based on the used private primitives. In this dissertation, we focus on the usage of one popular differentially private primitive, Sparse Vector Technique, which can support multiple queries with limited privacy cost.

First, we revisit the original Sparse Vector Technique and its variants, proving that many of its variants violate the definition of differential privacy. Furthermore, we design an attack

algorithm demonstrating that an adversary can reconstruct the true database with high accuracy having access to these ``broken" variants.

Next, we utilize the original Sparse Vector Technique primitive to design new solutions for practical problems. We propose the first algorithms to publish regression diagnostics

under differential privacy for evaluating regression models. Specifically, we create differentially private versions of residual plots for linear regression as well as receiver operating characteristic (ROC) curves and binned residual plot for logistic regression. Comprehensive empirical studies show these algorithms are effective and enable users to evaluate the correctness of their model assumptions.

We then make use of Sparse Vector Technique as a key primitive to design a novel algorithm for differentially private stream processing, supporting queries on streaming data. This novel algorithm is data adaptive and can simultaneously support multiple queries, such as unit counts, sliding windows and event monitoring, over a single or multiple stream resolutions. Through extensive evaluations, we show that this new technique outperforms the state-of-the-art algorithms, which are specialized to particular query types.

Type

Dissertation

Department

Computer Science

Subjects

Computer science, Data privacy, Differential privacy

Permalink

https://hdl.handle.net/10161/16906

Citation

Chen, Yan (2018). Applying Differential Privacy with Sparse Vector Technique. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/16906.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Applying Differential Privacy with Sparse Vector Technique

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections