Some Methods for Differentially Private Data Synthesis

Doctoral Dissertation

Abstract

Balancing between protecting the privacy of individuals who contribute to data sets and releasing data sets of good utility is of extreme importance. Even with data sets anonymized, there is a still a possibility that an intruder may identify a subject in a released data set. Many of the existing methods for data privacy and confidentiality do not quantify the amount of privacy that the data set may leak. Differential privacy provides a conceptual approach to bring strong mathematical guarantee for privacy protection and quantifies the amount of privacy the data set leaks when it is released for public use. My dissertation explores the recently developed differentially private data synthesis (DIPS) methods for incorporating differential privacy when generating synthetic data to be publicly released. I first developed a DIPS algorithm called CIPHER to construct differentially privacy microdata from low dimensional histograms by solving linear equations with Tikhonov regularization. CIPHER decomposes joint probabilities via basic probability rules to construct the equation set and subsequently solves linear equations. Simulations and qualitative banking data case study was conducted to compare CIPHER to existing methods called MWEM (multiplicative weighting via exponential mechanism) and the full-dimensional histogram (FDH) sanitization. Next my dissertation focuses on a exponential random graph model that incorporates differential privacy for social network data. An additional level of complexity is present in social network data as the possibly many relationships between nodes and edges must be considered. The algorithm developed in my work was applied to several real-life data sets to understand how well the differential private synthetic social network data released by our algorithm compares to that of the original network. Lastly, my dissertation focuses on multiplicative weights and the single observation influence measure. This focused on exploring more in depth the multiplicative weighting via exponential mechanism and incorporating a single observation influence measure to allow the algorithm to be applied to any type of data, as long as the model and sufficient statistics are known.

Attributes

Attribute NameValues
Author Evercita Cuevas Eugenio
Contributor Ick-Hoon Jin, Committee Member
Contributor Fang Liu, Research Director
Contributor Lizhen Lin, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Applied and Computational Mathematics and Statistics
Degree Name Doctor of Philosophy
Banner Code
  • PHD-ACMS

Defense Date
  • 2019-06-04

Submission Date 2019-07-05
Subject
  • Statistics

  • Data Privacy

  • Differential Privacy

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units
Catalog Record

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.