A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

Li, Bai

A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

View / Download400.45 KB

Date

2017

Authors

Li, Bai

Advisors

Steorts, Rebecca C

Repository Usage Stats

304
views

133
downloads

Abstract

Differential privacy (DP) aims to design methods and algorithms that satisfy rigorous notions of privacy while simultaneously providing utility with valid statistical inference. More recently, an emphasis has been placed on combining notions of statistical utility with algorithmic approaches to address privacy risk in the presence of big data---with differential privacy emerging as a rigorous notion of risk. While DP provides strong guarantees for privacy, there are often tradeoffs regarding data utility and computational scalability. In this paper, we introduce a categorical data synthesizer that releases high-dimensional sparse histograms, illustrating its ability to overcome current limitations with data synthesizers in the current literature. Specifically, we combine a differential privacy algorithm---the stability based algorithm--- along with feature hashing, with allows for dimension reduction in terms of the histograms and Gibbs sampling. As a result, our proposed algorithm is differentially private, offers similar or better statistical utility and is scalable to large databases. In addition, we give an analytical result for the error caused by the stability based algorithm, which allows us to control the loss of utility. Finally, we study the behavior of our algorithm on both simulated and real data.

Type

Master's thesis

Department

Statistical Science

Subjects

Statistics, Computer science, Differential privacy, Methodology

Permalink

https://hdl.handle.net/10161/15252

Citation

Li, Bai (2017). A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/15252.

Collections

Masters Theses

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections