University of Notre Dame
Browse
EugenioEC072019D.pdf (8.69 MB)

Some Methods for Differentially Private Data Synthesis

Download (8.69 MB)
thesis
posted on 2019-07-05, 00:00 authored by Evercita Cuevas Eugenio

Balancing between protecting the privacy of individuals who contribute to data sets and releasing data sets of good utility is of extreme importance. Even with data sets anonymized, there is a still a possibility that an intruder may identify a subject in a released data set. Many of the existing methods for data privacy and confidentiality do not quantify the amount of privacy that the data set may leak. Differential privacy provides a conceptual approach to bring strong mathematical guarantee for privacy protection and quantifies the amount of privacy the data set leaks when it is released for public use. My dissertation explores the recently developed differentially private data synthesis (DIPS) methods for incorporating differential privacy when generating synthetic data to be publicly released. I first developed a DIPS algorithm called CIPHER to construct differentially privacy microdata from low dimensional histograms by solving linear equations with Tikhonov regularization. CIPHER decomposes joint probabilities via basic probability rules to construct the equation set and subsequently solves linear equations. Simulations and qualitative banking data case study was conducted to compare CIPHER to existing methods called MWEM (multiplicative weighting via exponential mechanism) and the full-dimensional histogram (FDH) sanitization. Next my dissertation focuses on a exponential random graph model that incorporates differential privacy for social network data. An additional level of complexity is present in social network data as the possibly many relationships between nodes and edges must be considered. The algorithm developed in my work was applied to several real-life data sets to understand how well the differential private synthetic social network data released by our algorithm compares to that of the original network. Lastly, my dissertation focuses on multiplicative weights and the single observation influence measure. This focused on exploring more in depth the multiplicative weighting via exponential mechanism and incorporating a single observation influence measure to allow the algorithm to be applied to any type of data, as long as the model and sufficient statistics are known.

History

Date Modified

2019-08-28

Defense Date

2019-06-04

CIP Code

  • 27.9999

Research Director(s)

Fang Liu

Committee Members

Ick-Hoon Jin Lizhen Lin

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1112064916

Library Record

5187111

OCLC Number

1112064916

Program Name

  • Applied and Computational Mathematics and Statistics

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC