Applying Differential Privacy with Sparse Vector Technique
In today's fast-paced developing digital world, a wide range of services such as web services, social networks, and mobile devices collect a large amount of
personal data from their users. Although sharing and mining large-scale personal data can help improve the functionality of
these services, it also raises privacy concerns for the individuals who contribute to the data.
Differential privacy has emerged as a de facto standard for analyzing sensitive data with strong provable privacy guarantees for individuals. There is a rich literature that
has led to the development of differentially private algorithms for numerous data analysis tasks. The privacy proof of these algorithms are mainly based on (a) the privacy guarantees
of a small number of primitives, and (b) a set of composition theorems that help reason about the privacy guarantee of algorithms based on the used private primitives. In this dissertation, we focus on the usage of one popular differentially private primitive, Sparse Vector Technique, which can support multiple queries with limited privacy cost.
First, we revisit the original Sparse Vector Technique and its variants, proving that many of its variants violate the definition of differential privacy. Furthermore, we design an attack
algorithm demonstrating that an adversary can reconstruct the true database with high accuracy having access to these ``broken" variants.
Next, we utilize the original Sparse Vector Technique primitive to design new solutions for practical problems. We propose the first algorithms to publish regression diagnostics
under differential privacy for evaluating regression models. Specifically, we create differentially private versions of residual plots for linear regression as well as receiver operating characteristic (ROC) curves and binned residual plot for logistic regression. Comprehensive empirical studies show these algorithms are effective and enable users to evaluate the correctness of their model assumptions.
We then make use of Sparse Vector Technique as a key primitive to design a novel algorithm for differentially private stream processing, supporting queries on streaming data. This novel algorithm is data adaptive and can simultaneously support multiple queries, such as unit counts, sliding windows and event monitoring, over a single or multiple stream resolutions. Through extensive evaluations, we show that this new technique outperforms the state-of-the-art algorithms, which are specialized to particular query types.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations