File(s) under permanent embargo
Data Privacy via Integration of Differential Privacy and Data Synthesis
When sharing data among collaborators or releasing data publicly, one of the crucial concerns is the extreme risk of exposing personal information of individuals who contribute to the data. Many statistical methods of data privacy and confidentiality have little to no means in measuring an altered data set's privacy guarantee. Differential privacy, a condition on data releasing algorithms, quantifies disclosure risk, but is traditionally used in a query based privacy method instead of in a synthetic dataset release. My dissertation develops and explores various methods of incorporating differential privacy in synthetic data generation using predicted values within a Bayesian framework. I call these methods, differentially private data synthesis (DIPS) techniques. In my dissertation, I first conducted a comparative study of several DIPS approaches on various data types as well as a case study on Male Fertility data. Next, I created a method (called SPECKS) to compare DIPS data to real-life data, and another method to improve the statistical inferences of non-parametric DIPS approaches. These methods were tested on voter registration data. Finally, I developed a DIPS technique for social network data called Noisy Edges and Traits (NET) and applied it to two real-life data sets.
History
Alt Title
Differentially private data synthesisDate Created
2018-04-03Date Modified
2018-10-30Defense Date
2018-03-27Research Director(s)
Fang LiuDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Program Name
- Applied and Computational Mathematics and Statistics