Learning with Correntropy-induced Losses for Regression with Mixture of Symmetric Stable Noise

In recent years, correntropy and its applications in machine learning have been drawing continuous attention owing to its merits in dealing with non-Gaussian noise and outliers. However, theoretical understanding of correntropy, especially in the statistical learning context, is still limited. In this study, within the statistical learning framework, we investigate correntropy based regression in the presence of non-Gaussian noise or outliers. Motivated by the practical way of generating non-Gaussian noise or outliers, we introduce mixture of symmetric stable noise, which include Gaussian noise, Cauchy noise, and their mixture as special cases, to model non-Gaussian noise or outliers. We demonstrate that under the mixture of symmetric stable noise assumption, correntropy based regression can learn the conditional mean function or the conditional median function well without resorting to the finite-variance or even the finite first-order moment condition on the noise. In particular, for the above two cases, we establish asymptotic optimal learning rates for correntropy based regression estimators that are asymptotically of type $\mathcal{O}(n^{-1})$. These results justify the effectiveness of the correntropy based regression estimators in dealing with outliers as well as non-Gaussian noise. We believe that the present study completes our understanding towards correntropy based regression from a statistical learning viewpoint, and may also shed some light on robust statistical learning for regression.

[1]  Carlos Abreu Ferreira,et al.  Development and testing of improved statistical wind power forecasting methods. , 2011 .

[2]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[3]  Yonina C. Eldar,et al.  Finite-memory denoising in impulsive noise using Gaussian mixture models , 2001 .

[4]  Badong Chen,et al.  Maximum Correntropy Estimation Is a Smoothed MAP Estimation , 2012, IEEE Signal Processing Letters.

[5]  David E. Tyler,et al.  On the behavior of Tukey's depth and median under symmetric stable distributions , 2004 .

[6]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[7]  Brian M. Sadler,et al.  Maximum-likelihood array processing in non-Gaussian noise with Gaussian mixtures , 2000, IEEE Trans. Signal Process..

[8]  Grady Miller Properties of certain symmetric stable distributions , 1978 .

[9]  Xi Liu,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < , 2022 .

[10]  Nanning Zheng,et al.  Correntropy Maximization via ADMM: Application to Robust Hyperspectral Unmixing , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[12]  V. Miranda,et al.  Entropy and Correntropy Against Minimum Square Error in Offline and Online Three-Day Ahead Wind Power Forecasting , 2009, IEEE Transactions on Power Systems.

[13]  E. Fama,et al.  Parameter Estimates for Symmetric Stable Distributions , 1971 .

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  E. Fama,et al.  Some Properties of Symmetric Stable Distributions , 1968 .

[16]  C. L. Nikias,et al.  Signal processing with fractional lower order moments: stable processes and their applications , 1993, Proc. IEEE.

[17]  Ran He,et al.  Maximum Correntropy Criterion for Robust Face Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Erion Hasanbelliu,et al.  Information Theoretic Shape Matching , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jun Fan,et al.  Learning theory approach to minimum error entropy criterion , 2012, J. Mach. Learn. Res..

[20]  B. Kosko,et al.  Robust stochastic resonance: signal detection and adaptation in impulsive noise. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[22]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[23]  Johan A. K. Suykens,et al.  Learning with the maximum correntropy criterion induced losses for regression , 2015, J. Mach. Learn. Res..

[24]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[25]  Jun Fan,et al.  A Statistical Learning Approach to Modal Regression , 2017, J. Mach. Learn. Res..

[26]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[28]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[29]  Cihan Tepedelenlioglu,et al.  Diversity Combining over Rayleigh Fading Channels with Symmetric Alpha-Stable Noise , 2010, IEEE Transactions on Wireless Communications.

[30]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[31]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[32]  Peter J. W. Rayner,et al.  Near optimal detection of signals in impulsive noise modeled with a symmetric /spl alpha/-stable distribution , 1998, IEEE Communications Letters.

[33]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[34]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[35]  Simon J. Godsill,et al.  On-line Bayesian estimation of signals in symmetric /spl alpha/-stable noise , 2006, IEEE Transactions on Signal Processing.

[36]  José Carlos Príncipe,et al.  Generalized correlation function: definition, properties, and application to blind equalization , 2006, IEEE Transactions on Signal Processing.

[37]  Xiaolei Wang,et al.  Non-negative matrix factorization by maximizing correntropy for cancer clustering , 2013, BMC Bioinformatics.

[38]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[39]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[40]  Cuiming Zou,et al.  Robust signal recovery using the prolate spherical wave functions and maximum correntropy criterion , 2018 .

[41]  Chunhong Pan,et al.  Robust level set image segmentation via a local correntropy-based K-means clustering , 2014, Pattern Recognit..

[42]  José Carlos Príncipe,et al.  The C-loss function for pattern classification , 2014, Pattern Recognit..

[43]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[44]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[45]  Ran He,et al.  Robust Principal Component Analysis Based on Maximum Correntropy Criterion , 2011, IEEE Transactions on Image Processing.

[46]  Marco J. Lombardi,et al.  On-line Bayesian Estimation of Signals in Symmetric α-Stable Noise , 2004 .

[47]  Ying Wang,et al.  Robust Hyperspectral Unmixing With Correntropy-Based Metric , 2013, IEEE Transactions on Image Processing.

[48]  Jun Wang,et al.  Information theoretic learning applied to wind power modeling , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[49]  Ding-Xuan Zhou,et al.  Concentration estimates for learning with ℓ1-regularizer and data dependent hypothesis spaces , 2011 .

[50]  C. L. Nikias,et al.  Signal processing with alpha-stable distributions and applications , 1995 .

[51]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[52]  Yuan Yan Tang,et al.  Correntropy Matching Pursuit With Application to Robust Digit and Face Recognition , 2017, IEEE Transactions on Cybernetics.

[53]  Bao-Gang Hu,et al.  Robust C-Loss Kernel Classifiers , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Kiseon Kim,et al.  Maximin Distributed Detection in the Presence of Impulsive Alpha-Stable Noise , 2011, IEEE Transactions on Wireless Communications.

[55]  Ding-Xuan Zhou,et al.  Concentration estimates for learning with unbounded sampling , 2013, Adv. Comput. Math..

[56]  Dimitrios Hatzinakos,et al.  Analytic alpha-stable noise modeling in a Poisson field of interferers or scatterers , 1998, IEEE Trans. Signal Process..

[57]  S.A. Kassam,et al.  Robust techniques for signal processing: A survey , 1985, Proceedings of the IEEE.

[58]  Abdelhak M. Zoubir,et al.  The stability test for symmetric alpha-stable distributions , 2005, IEEE Transactions on Signal Processing.

[59]  Cheng Wang,et al.  Optimal learning rates for least squares regularized regression with unbounded sampling , 2011, J. Complex..

[60]  J. Ilow,et al.  Detection for binary transmission in a mixture of Gaussian noise and impulsive noise modeled as an alpha-stable process , 1994, IEEE Signal Processing Letters.

[61]  Nanning Zheng,et al.  Generalized Correntropy for Robust Adaptive Filtering , 2015, IEEE Transactions on Signal Processing.

[62]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[63]  Jana Jurečková,et al.  Robust Statistical Methods with R , 2005 .

[64]  Yiming Ying,et al.  Multi-kernel regularized classifiers , 2007, J. Complex..

[65]  Jun Fan,et al.  Consistency Analysis of an Empirical Minimum Error Entropy Algorithm , 2014, ArXiv.

[66]  W. DuMouchel Stable Distributions in Statistical Inference: 1. Symmetric Stable Distributions Compared to other Symmetric Long-Tailed Distributions , 1973 .