Fairness in Differentially Private Data Release

Loading...
Thumbnail Image

Date

2022

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

16
views
118
downloads

Abstract

Data privacy has received an increased amount of attention in the recent decade. Large scale data collection has been the norm for both scientific and commercial uses. This private information about individuals is often leaked in the form of data release, aggregate statistics and machine learning models. Differential privacy has become the gold standard to limit private data leakage. Differentially private mechanisms work by infusing noise into private data releases, obfuscating the contribution of any individual. It is unclear on how the noise introduced by differential privacy affects the utility experienced by different population groups. It has been shown recently that typical uses for such private data such as machine learning and allocation tasks can result in different utilities across population groups and it is unknown how differential privacy interacts with these existing inequities.

We investigate the effects of differential privacy on the downstream utility experienced by different population groups. First we study the downstream effects of naive applications of differential privacy on well known census tasks. We show that differential privacy can magnify the impact of existing inequities as well as introduce inequities which were previously not present. Likewise we show that a carefully constructed mechanism with knowledge of the task at hand can reduce such inequities. We propose two frameworks for acknowledging and addressing these possible inequities, query answering and synthetic data. For query answering we propose a multi-analyst approach where different representatives for each group can share resources to maximize utility while ensuring a fair distribution of utility across groups. We propose a framework for ensuring fair and private synthetic data. Our approach creates a private synthetic dataset which preserves statistics from the original dataset while limiting the influence of known protected classes in classification tasks.

Description

Provenance

Citation

Citation

Pujol, David Anthony (2022). Fairness in Differentially Private Data Release. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/26813.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.