Learning under (1 + ϵ)-moment conditions

Abstract We study the theoretical underpinning of a robust empirical risk minimization (RERM) scheme which has been finding numerous successful applications across various data science fields owing to its robustness to outliers and heavy-tailed noises. The specialties of RERM lie in its nonconvexity and that it is induced by a loss function with an integrated scale parameter trading off the robustness and the prediction accuracy. The nonconvexity of RERM and the integrated scale parameter also bring barriers when assessing its learning performance theoretically. In this paper, concerning the study of RERM, we make the following main contributions. First, we establish a no-free-lunch result, showing that there is no hope of distribution-free learning of the truth without adjusting the scale parameter. Second, by imposing the ( 1 + ϵ ) -th (with ϵ > 0 ) order moment condition on the response variable, we establish a comparison theorem that characterizes the relation between the excess generalization error of RERM and its prediction error. Third, with a diverging scale parameter, we establish almost sure convergence rates for RERM under the ( 1 + ϵ ) -moment condition. Notably, the ( 1 + ϵ ) -moment condition allows the presence of noise with infinite variance. Last but not least, the learning theory analysis of RERM conducted in this study, on one hand, showcases the merits of RERM on robustness and the trade-off role that the scale parameter plays, and on the other hand, brings us inspirational insights into robust machine learning.

[1]  Charles K. Chui,et al.  Theory inspired deep network for instantaneous-frequency extraction and signal components recovery from discrete blind-source data , 2020, ArXiv.

[2]  Yiming Ying,et al.  Learning with Correntropy-induced Losses for Regression with Mixture of Symmetric Stable Noise , 2018, Applied and Computational Harmonic Analysis.

[3]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[4]  Paul L. Gribble,et al.  Dissociating error-based and reinforcement-based loss functions during sensorimotor learning , 2017, PLoS Comput. Biol..

[5]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Ding-Xuan Zhou Deep distributed convolutional neural networks: Universality , 2018, Analysis and Applications.

[7]  Azriel Rosenfeld,et al.  Robust regression methods for computer vision: A review , 1991, International Journal of Computer Vision.

[8]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[9]  Daniel J. Hsu,et al.  Heavy-tailed regression with a generalized median-of-means , 2014, ICML.

[10]  Jun Fan,et al.  A Statistical Learning Approach to Modal Regression , 2017, J. Mach. Learn. Res..

[11]  Tieniu Tan,et al.  Robust Subspace Clustering via Half-Quadratic Minimization , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[13]  Matthieu Lerasle,et al.  ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .

[14]  Ding-Xuan Zhou,et al.  Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.

[15]  Jun Fan,et al.  Consistency Analysis of an Empirical Minimum Error Entropy Algorithm , 2014, ArXiv.

[16]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[17]  Jing Lv,et al.  An efficient and robust variable selection method for longitudinal generalized linear models , 2015, Comput. Stat. Data Anal..

[18]  Charles K. Chui,et al.  Deep Net Tree Structure for Balance of Capacity and Approximation Ability , 2019, Front. Appl. Math. Stat..

[19]  Ran He,et al.  Maximum Correntropy Criterion for Robust Face Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Gang Wang,et al.  Robust Non-Rigid Point Set Registration Using Spatially Constrained Gaussian Fields , 2017, IEEE Transactions on Image Processing.

[21]  Jean Ponce,et al.  Robust Guided Image Filtering Using Nonconvex Potentials , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Tieniu Tan,et al.  Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization , 2011, CVPR 2011.

[23]  Faisal Khan,et al.  Application of loss functions in process economic risk assessment , 2016 .

[24]  Gang Wang,et al.  Removing mismatches for retinal image registration via multi-attribute-driven regularized mixture model , 2016, Inf. Sci..

[25]  Jun Fan,et al.  Learning theory approach to minimum error entropy criterion , 2012, J. Mach. Learn. Res..

[26]  D. Middleton An Introduction to Statistical Communication Theory , 1960 .

[27]  Jie Xu,et al.  New Robust Metric Learning Model Using Maximum Correntropy Criterion , 2018, KDD.

[28]  Qingxiong Yang,et al.  Robust Piecewise-Constant Smoothing: M-Smoother Revisited , 2014, ArXiv.

[29]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[30]  Fred A. Spiring,et al.  A General Class of Loss Functions with Industrial Applications , 1998 .

[31]  Gang Wang,et al.  Gaussian field consensus: A robust nonparametric matching method for outlier rejection , 2018, Pattern Recognit..

[32]  Fred A. Spiring,et al.  Some Properties of the Family of Inverted Probability Loss Functions , 2004 .

[33]  Tieniu Tan,et al.  Half-Quadratic-Based Iterative Minimization for Robust Sparse Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Charles K. Chui,et al.  Deep Neural Networks for Rotation-Invariance Approximation and Learning , 2019, Analysis and Applications.

[35]  Luigi Acerbi,et al.  On the Origins of Suboptimality in Human Probabilistic Inference , 2014, PLoS Comput. Biol..

[36]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[37]  Fred A. Spiring,et al.  The reflected normal loss function , 1993 .

[38]  ESTIMATING MULTIVARIATE NORMAL MEANS USING A CLASS OF BOUNDED LOSS FUNCTIONS , 1988 .

[39]  Shuicheng Yan,et al.  Correntropy Induced L2 Graph for Robust Subspace Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Mongi A. Abidi,et al.  A new method for the registration of three-dimensional point-sets: The Gaussian Fields framework , 2010, Image Vis. Comput..

[41]  Ralph. Deutsch,et al.  Estimation Theory , 1966 .

[42]  G. Lugosi,et al.  Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.

[43]  Charles K. Chui,et al.  Deep Nets for Local Manifold Learning , 2016, Front. Appl. Math. Stat..

[44]  A. Parsian,et al.  Estimation After Selection Under Reflected Normal Loss Function , 2012 .

[45]  Luigi Acerbi,et al.  Target Uncertainty Mediates Sensorimotor Error Correction , 2017, PloS one.

[46]  Johan A. K. Suykens,et al.  Learning with the maximum correntropy criterion induced losses for regression , 2015, J. Mach. Learn. Res..

[47]  Ji Zhao,et al.  Non-rigid visible and infrared face registration via regularized Gaussian fields criterion , 2015, Pattern Recognit..

[48]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[49]  Erion Hasanbelliu,et al.  Information Theoretic Shape Matching , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[51]  Melis Zeybek,et al.  Optimization of correlated multi-response quality engineering by the upside-down normal loss function , 2016 .

[52]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[53]  Y. Niv,et al.  Interaction between emotional state and learning underlies mood instability , 2015, Nature communications.

[54]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[55]  Shu-Kai S. Fan,et al.  An upside-down normal loss function-based method for quality improvement , 2012 .

[56]  P. Mrázek,et al.  ON ROBUST ESTIMATION AND SMOOTHING WITH SPATIAL AND TONAL KERNELS , 2006 .

[57]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[58]  Robust estimation and empirical likelihood inference with exponential squared loss for panel data models , 2018 .

[59]  David Middleton Non-Gaussian Statistical Communication Theory , 2012 .

[60]  Zhongyi Zhu,et al.  Robust exponential squared loss-based estimation in semi-functional linear regression models , 2019, Comput. Stat..

[61]  Bo Du,et al.  Multi-label Active Learning Based on Maximum Correntropy Criterion: Towards Robust and Discriminative Labeling , 2016, ECCV.

[62]  Heping Zhang,et al.  Robust Variable Selection With Exponential Squared Loss , 2013, Journal of the American Statistical Association.

[63]  J. Dennis,et al.  Techniques for nonlinear least squares and robust regression , 1978 .

[64]  Feng-Bin Sun,et al.  On Spiring's normal loss function , 1996 .

[65]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[66]  Baojian Xie,et al.  Robust estimation for the varying coefficient partially nonlinear models , 2017, J. Comput. Appl. Math..

[67]  Mongi A. Abidi,et al.  Gaussian fields: a new criterion for 3D rigid registration , 2004, Pattern Recognit..

[68]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Emmanuel J. Candès,et al.  Modern statistical estimation via oracle inequalities , 2006, Acta Numerica.

[70]  Kangning Wang,et al.  Robust structure identification and variable selection in partial linear varying coefficient models , 2016 .

[71]  N. K. Sharma,et al.  Determining the optimum manufacturing target using the inverted normal loss function , 2011 .

[72]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[73]  Jim Zurcher The use of a Gaussian cost function in piecewise linear modelling , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).