A Framework of Learning Through Empirical Gain Maximization

We develop in this letter a framework of empirical gain maximization (EGM) to address the robust regression problem where heavy-tailed noise or outliers may be present in the response variable. The idea of EGM is to approximate the density function of the noise distribution instead of approximating the truth function directly as usual. Unlike the classical maximum likelihood estimation that encourages equal importance of all observations and could be problematic in the presence of abnormal observations, EGM schemes can be interpreted from a minimum distance estimation viewpoint and allow the ignorance of those observations. Furthermore, we show that several well-known robust nonconvex regression paradigms, such as Tukey regression and truncated least square regression, can be reformulated into this new framework. We then develop a learning theory for EGM by means of which a unified analysis can be conducted for these well-established but not fully understood regression approaches. This new framework leads to a novel interpretation of existing bounded nonconvex loss functions. Within this new framework, the two seemingly irrelevant terminologies, the well-known Tukey's biweight loss for robust regression and the triweight kernel for nonparametric smoothing, are closely related. More precisely, we show that Tukey's biweight loss can be derived from the triweight kernel. Other frequently employed bounded nonconvex loss functions in machine learning, such as the truncated square loss, the Geman-McClure loss, and the exponential squared loss, can also be reformulated from certain smoothing kernels in statistics. In addition, the new framework enables us to devise new bounded nonconvex loss functions for robust learning.

[1]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[2]  Ting Hu,et al.  Convergence of Gradient Descent for Minimum Error Entropy Principle in Linear Regression , 2016, IEEE Transactions on Signal Processing.

[3]  Jun Fan,et al.  Learning theory approach to minimum error entropy criterion , 2012, J. Mach. Learn. Res..

[4]  Breakdown points for redescending m-estimates of location , 1995 .

[5]  Ting Hu,et al.  Distributed kernel gradient descent algorithm for minimum error entropy principle , 2020 .

[6]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Ruosong Wang,et al.  Dimensionality Reduction for Tukey Regression , 2019, ICML.

[8]  Jun Fan,et al.  Consistency Analysis of an Empirical Minimum Error Entropy Algorithm , 2014, ArXiv.

[9]  F. Neumann,et al.  Robust Fitting in Computer Vision: Easy or Hard? , 2018, International Journal of Computer Vision.

[10]  Michael J. Black,et al.  On the unification of line processes, outlier rejection, and robust statistics with applications in early vision , 1996, International Journal of Computer Vision.

[11]  Johan A. K. Suykens,et al.  Learning with the maximum correntropy criterion induced losses for regression , 2015, J. Mach. Learn. Res..

[12]  Deniz Erdoğmuş,et al.  COMPARISON OF ENTROPY AND MEAN SQUARE ERROR CRITERIA IN ADAPTIVE SYSTEM TRAINING USING HIGHER ORDER STATISTICS , 2004 .

[13]  Venu Madhav Govindu,et al.  Robust Relative Rotation Averaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Feng Liu,et al.  Robust state estimator based on maximum exponential absolute value , 2017, 2017 IEEE Power & Energy Society General Meeting.

[15]  Nahum Kiryati,et al.  Image Deblurring in the Presence of Impulsive Noise , 2006, International Journal of Computer Vision.

[16]  Vladlen Koltun,et al.  Robust continuous clustering , 2017, Proceedings of the National Academy of Sciences.

[17]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[18]  Xin Guo Distributed Minimum Error Entropy Algorithms , 2020 .

[19]  Fabien Lauer On the exact minimization of saturated loss functions for robust regression and subspace estimation , 2018, Pattern Recognit. Lett..

[20]  Heping Zhang,et al.  Robust Variable Selection With Exponential Squared Loss , 2013, Journal of the American Statistical Association.

[21]  J. Dennis,et al.  Techniques for nonlinear least squares and robust regression , 1978 .

[22]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[23]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[24]  Yunlong Feng,et al.  Learning under (1 + ϵ)-moment conditions , 2020 .

[25]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[26]  Hui Jiang,et al.  Minimizing Sum of Truncated Convex Functions and Its Applications , 2016, Journal of Computational and Graphical Statistics.

[27]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[28]  A. G. Amitha Perera,et al.  Estimating model parameters and boundaries by minimizing a joint, robust objective function , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[29]  J. Tukey,et al.  The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data , 1974 .

[30]  Yunlong Feng New Insights Into Learning With Correntropy-Based Regression , 2021, Neural Computation.

[31]  John S. J. Hsu,et al.  Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers , 1999 .

[32]  Larry S. Davis,et al.  Temporal Multi-Scale Models for Flow and Acceleration , 2004, International Journal of Computer Vision.

[33]  Tat-Jun Chin,et al.  The Maximum Consensus Problem: Recent Algorithmic Advances , 2017, Synthesis Lectures on Computer Vision.

[34]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[35]  Larry S. Davis,et al.  Learned Models for Estimation of Rigid and Articulated Human Motion from Stationary or Moving Camera , 2004, International Journal of Computer Vision.

[36]  Jun Fan,et al.  A Statistical Learning Approach to Modal Regression , 2017, J. Mach. Learn. Res..

[37]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[38]  Maria Caterina Bramati,et al.  Robust Estimators for the Fixed Effects Panel Data Model , 2007 .

[39]  Nassir Navab,et al.  Robust Optimization for Deep Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Fred A. Spiring,et al.  The reflected normal loss function , 1993 .

[41]  Kiyoharu Aizawa,et al.  Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Le Chang,et al.  Robust Lasso Regression Using Tukey's Biweight Criterion , 2018, Technometrics.

[43]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[44]  Charles V. Stewart,et al.  Robust hierarchical algorithm for constructing a mosaic from images of the curved human retina , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[45]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[46]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[47]  Xin Wang,et al.  Mixture correntropy for robust learning , 2018, Pattern Recognit..

[48]  Xiaowei Zhou,et al.  Coherent Reconstruction of Multiple Humans From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[50]  Azriel Rosenfeld,et al.  Robust regression methods for computer vision: A review , 1991, International Journal of Computer Vision.

[51]  M. Hinich,et al.  A Simple Method for Robust Regression , 1975 .

[52]  Yiming Ying,et al.  Learning with Correntropy-induced Losses for Regression with Mixture of Symmetric Stable Noise , 2018, Applied and Computational Harmonic Analysis.

[53]  Fabrice Heitz,et al.  Robust Registration of Dissimilar Single and Multimodal Images , 1998, ECCV.

[54]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[55]  D. F. Andrews,et al.  A Robust Method for Multiple Linear Regression , 1974 .

[56]  Shaogang Gong,et al.  View-Based Adaptive Affine Tracking , 1998, ECCV.