论文信息 - Robust Estimation and Generative Adversarial Nets

Robust Estimation and Generative Adversarial Nets

Robust estimation under Huber's $\epsilon$-contamination model has become an important topic in statistics and theoretical computer science. Statistically optimal procedures such as Tukey's median and other estimators based on depth functions are impractical because of their computational intractability. In this paper, we establish an intriguing connection between $f$-GANs and various depth functions through the lens of $f$-Learning. Similar to the derivation of $f$-GANs, we show that these depth functions that lead to statistically optimal robust estimators can all be viewed as variational lower bounds of the total variation distance in the framework of $f$-Learning. This connection opens the door of computing robust estimators using tools developed for training GANs. In particular, we show in both theory and experiments that some appropriate structures of discriminator networks with hidden layers in GANs lead to statistically optimal robust location estimators for both Gaussian distribution and general elliptical distributions where first moment may not exist.

[1] E. L. Lehmann,et al. Theory of point estimation , 1950 .

[2] P. J. Huber. Robust Estimation of a Location Parameter , 1964 .

[3] P. J. Huber. A Robust Version of the Probability Ratio Test , 1965 .

[4] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[5] J. Tukey. Mathematics and the Picturing of Data , 1975 .

[6] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[7] D. Pollard. Convergence of stochastic processes , 1984 .

[8] Y. Yatracos. Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[9] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[10] D. Donoho,et al. Geometrizing Rates of Convergence, III , 1991 .

[11] D. Donoho,et al. Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[12] J. Cima,et al. On weak* convergence in ¹ , 1996 .

[13] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[14] Peter Rousseeuw,et al. Computing location depth and regression depth in higher dimensions , 1998, Stat. Comput..

[15] Regina Y. Liu,et al. Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[16] R. Serfling,et al. General notions of statistical depth function , 2000 .

[17] F. Alajaji,et al. Lectures Notes in Information Theory , 2000 .

[18] David Eppstein,et al. Regression Depth and Center Points , 1998, Discret. Comput. Geom..

[19] Jian Zhang. Some Extensions of Tukey's Depth Function , 2002 .

[20] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[21] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[22] I. Mizera. On depth and deep points: a calculus , 2002 .

[23] Timothy M. Chan. An optimal randomized algorithm for maximum Tukey depth , 2004, SODA '04.

[24] C. Müller,et al. Location–Scale Depth , 2004 .

[25] Bettina Speckmann,et al. Efficient Algorithms for Maximum Regression Depth , 2008, Discret. Comput. Geom..

[26] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[28] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[29] Chao Gao,et al. A General Decision Theory for Huber's $\epsilon$-Contamination Model , 2015, 1511.04144.

[30] Zoubin Ghahramani,et al. Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[31] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.

[32] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[33] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[34] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[35] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[36] Vaibhava Goel,et al. McGan: Mean and Covariance Feature Matching GAN , 2017, ICML.

[37] Tengyuan Liang,et al. How Well Can Generative Adversarial Networks (GAN) Learn Densities: A Nonparametric View , 2017, ArXiv.

[38] Jerry Li,et al. Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[39] L. Birge,et al. A new method for estimation and model selection:$$\rho $$ρ-estimation , 2014, 1403.6057.

[40] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[41] Fei Xia,et al. Understanding GANs: the LQG Setting , 2017, ArXiv.

[42] Sivaraman Balakrishnan,et al. Computationally Efficient Robust Estimation of Sparse Functionals , 2017, ArXiv.

[43] Kamalika Chaudhuri,et al. Approximation and Convergence Properties of Generative Adversarial Learning , 2017, NIPS.

[44] Chao Gao. Robust regression via mutivariate regression depth , 2017, Bernoulli.