University of Notre Dame
Browse

File(s) under permanent embargo

Noise Injection and Noise Augmentation for Model Regularization, Differential Privacy and Statistical Learning

thesis
posted on 2020-03-25, 00:00 authored by Yinan Li

Parametric statistical models tend to over-fit training data hence fail to generalize to new data, when redundant features are included or the signal-noise ratio is low in data. One of the effective and handy approach to mitigate over-fitting issue is using regularization, which has been extensively studied in parametric models. However, regularization methods are largely dedicated to parameter shrinkage, which on the one hand, incur bias in parameter estimation, and on the other hand hardly take on additional generalization-promoting effects. To address the drawbacks regarding regularization methods, I proposed and implemented the Noise Injection (NI) and Noise Augmentation (NA) methods in a variety of models, and theoretically studied their properties. My dissertation includes whiteout in Neural Networks, which adaptively inject noise into nodes to achieve regularization effects and promote robustness; fast Converging and Robust Optimal Path Selection (CROPS) in continuous-time Markov-switching generalized autoregressive conditional heteroskedasticity (COMS-GARCH) process, CROPS is a Bernoulli NI enhanced Markov Chain Expectation Maximization (MC-EM) algorithm that improves accuracy in both hidden path identification and volatility estimation and achieve ensemble learning and robustness effects; AdaPtive Noise Augmentation (PANDA) in Generalized Linear Models (GLMs), PANDA realizes a wide range of existing regularization effects and also exact L0 regularization with little computational burden through the orthogonal regularization I proposed, PANDA also provides tighter confidence intervals with higher coverage probability for both zero and non-zero estimated parameters under variable selection regularization; PANDA in Undirected Graphical Models (UGMs), PANDA realized both likelihood based graphical L0, I proposed for Gaussian Graphical Models (GGMs) and existing Neighborhood Selection methods in UGMs; adaptive Noise Augmentation for differentially Private (NAP) Empirical Risk Minimization (ERM), NAP-ERM mitigates over regularization issue in existing works, hence improves utility, it also simultaneously achieves regularization and differential privacy (DP) through noise augmentation. I further introduced the idea of retrieving wasted privacy budget through NAP-ERM. As part of the future work, I extended PANDA L0 regularization into Support Vector Machine (SVM), I generalized the concept of orthogonal regularization to realize rank regularization in both multiple response GLMs and Tensor Regressions, I expect the combination of rank regularization and NAP-ERM show high utility while guaranteeing DP, I also expect there to be a more rigorous proof for graphical L0 by using duality.

History

Date Modified

2020-05-13

Defense Date

2020-03-05

CIP Code

  • 27.9999

Research Director(s)

Fang Liu

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1154012655

Library Record

5501934

OCLC Number

1154012655

Program Name

  • Applied and Computational Mathematics and Statistics

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC