Robust Estimation and Generative Adversarial Nets

Robust estimation under Huber's $\epsilon$-contamination model has become an important topic in statistics and theoretical computer science. Statistically optimal procedures such as Tukey's median and other estimators based on depth functions are impractical because of their computational intractability. In this paper, we establish an intriguing connection between $f$-GANs and various depth functions through the lens of $f$-Learning. Similar to the derivation of $f$-GANs, we show that these depth functions that lead to statistically optimal robust estimators can all be viewed as variational lower bounds of the total variation distance in the framework of $f$-Learning. This connection opens the door of computing robust estimators using tools developed for training GANs. In particular, we show in both theory and experiments that some appropriate structures of discriminator networks with hidden layers in GANs lead to statistically optimal robust location estimators for both Gaussian distribution and general elliptical distributions where first moment may not exist.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[3]  P. J. Huber A Robust Version of the Probability Ratio Test , 1965 .

[4]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[5]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[6]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[7]  D. Pollard Convergence of stochastic processes , 1984 .

[8]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[9]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[10]  D. Donoho,et al.  Geometrizing Rates of Convergence, III , 1991 .

[11]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[12]  J. Cima,et al.  On weak* convergence in ¹ , 1996 .

[13]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[14]  Peter Rousseeuw,et al.  Computing location depth and regression depth in higher dimensions , 1998, Stat. Comput..

[15]  Regina Y. Liu,et al.  Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[16]  R. Serfling,et al.  General notions of statistical depth function , 2000 .

[17]  F. Alajaji,et al.  Lectures Notes in Information Theory , 2000 .

[18]  David Eppstein,et al.  Regression Depth and Center Points , 1998, Discret. Comput. Geom..

[19]  Jian Zhang Some Extensions of Tukey's Depth Function , 2002 .

[20]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[21]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[22]  I. Mizera On depth and deep points: a calculus , 2002 .

[23]  Timothy M. Chan An optimal randomized algorithm for maximum Tukey depth , 2004, SODA '04.

[24]  C. Müller,et al.  Location–Scale Depth , 2004 .

[25]  Bettina Speckmann,et al.  Efficient Algorithms for Maximum Regression Depth , 2008, Discret. Comput. Geom..

[26]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Chao Gao,et al.  A General Decision Theory for Huber's $\epsilon$-Contamination Model , 2015, 1511.04144.

[30]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[31]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[32]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[33]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[34]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[35]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[36]  Vaibhava Goel,et al.  McGan: Mean and Covariance Feature Matching GAN , 2017, ICML.

[37]  Tengyuan Liang,et al.  How Well Can Generative Adversarial Networks (GAN) Learn Densities: A Nonparametric View , 2017, ArXiv.

[38]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[39]  L. Birge,et al.  A new method for estimation and model selection:$$\rho $$ρ-estimation , 2014, 1403.6057.

[40]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[41]  Fei Xia,et al.  Understanding GANs: the LQG Setting , 2017, ArXiv.

[42]  Sivaraman Balakrishnan,et al.  Computationally Efficient Robust Estimation of Sparse Functionals , 2017, ArXiv.

[43]  Kamalika Chaudhuri,et al.  Approximation and Convergence Properties of Generative Adversarial Learning , 2017, NIPS.

[44]  Chao Gao Robust regression via mutivariate regression depth , 2017, Bernoulli.

[45]  Daniel M. Kane,et al.  Robust Learning of Fixed-Structure Bayesian Networks , 2016, NeurIPS.

[46]  Germain Van Bever,et al.  Halfspace depths for scatter, concentration and shape matrices , 2017, The Annals of Statistics.

[47]  Xiaohua Zhai,et al.  The GAN Landscape: Losses, Architectures, Regularization, and Normalization , 2018, ArXiv.

[48]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[49]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[50]  Y. Baraud,et al.  Rho-estimators revisited: General theory and applications , 2016, The Annals of Statistics.

[51]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[52]  Pravesh Kothari,et al.  Robust moment estimation and improved clustering via sum of squares , 2018, STOC.

[53]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[54]  Chao Gao,et al.  Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[55]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[56]  Yu Bai,et al.  Approximability of Discriminators Implies Diversity in GANs , 2018, ICLR.

[57]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[58]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[59]  Y. Zuo On General Notions of Depth for Regression , 2018, Statistical Science.