When sharing data among collaborators or releasing data publicly, one of the crucial concerns is the extreme risk of exposing personal information of individuals who contribute to the data. Many statistical methods of data privacy and confidentiality have little to no means in measuring an altered data set’s privacy guarantee. Differential privacy, a condition on data releasing algorithms, quantifies disclosure risk, but is traditionally used in a query based privacy method instead of in a synthetic dataset release. My dissertation develops and explores various methods of incorporating differential privacy in synthetic data generation using predicted values within a Bayesian framework. I call these methods, differentially private data synthesis (DIPS) techniques. In my dissertation, I first conducted a comparative study of several DIPS approaches on various data types as well as a case study on Male Fertility data. Next, I created a method (called SPECKS) to compare DIPS data to real-life data, and another method to improve the statistical inferences of non-parametric DIPS approaches. These methods were tested on voter registration data. Finally, I developed a DIPS technique for social network data called Noisy Edges and Traits (NET) and applied it to two real-life data sets.
Data Privacy via Integration of Differential Privacy and Data SynthesisDoctoral Dissertation
|Author||Claire McKay Bowen|
|Contributor||Fang Liu, Research Director|
|Degree Level||Doctoral Dissertation|
|Degree Discipline||Applied and Computational Mathematics and Statistics|
|Record Visibility and Access||Public|
|Departments and Units|