Data Privacy via Integration of Differential Privacy and Data Synthesis

Doctoral Dissertation
Thumbnail

Abstract

When sharing data among collaborators or releasing data publicly, one of the crucial concerns is the extreme risk of exposing personal information of individuals who contribute to the data. Many statistical methods of data privacy and confidentiality have little to no means in measuring an altered data set’s privacy guarantee. Differential privacy, a condition on data releasing algorithms, quantifies disclosure risk, but is traditionally used in a query based privacy method instead of in a synthetic dataset release. My dissertation develops and explores various methods of incorporating differential privacy in synthetic data generation using predicted values within a Bayesian framework. I call these methods, differentially private data synthesis (DIPS) techniques. In my dissertation, I first conducted a comparative study of several DIPS approaches on various data types as well as a case study on Male Fertility data. Next, I created a method (called SPECKS) to compare DIPS data to real-life data, and another method to improve the statistical inferences of non-parametric DIPS approaches. These methods were tested on voter registration data. Finally, I developed a DIPS technique for social network data called Noisy Edges and Traits (NET) and applied it to two real-life data sets.

Attributes

Attribute NameValues
Alternate Title
  • Differentially private data synthesis

Author Claire McKay Bowen
Contributor Fang Liu, Research Director
Degree Level Doctoral Dissertation
Degree Discipline Applied and Computational Mathematics and Statistics
Degree Name PhD
Defense Date
  • 2018-03-27

Submission Date 2018-04-03
Subject
  • data privacy

  • differential privacy

  • data synthesis

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.