A New Correntropy-Based Conjugate Gradient Backpropagation Algorithm for Improving Training in Neural Networks

Mean square error (MSE) is the most prominent criterion in training neural networks and has been employed in numerous learning problems. In this paper, we suggest a group of novel robust information theoretic backpropagation (BP) methods, as correntropy-based conjugate gradient BP (CCG-BP). CCG-BP algorithms converge faster than the common correntropy-based BP algorithms and have better performance than the common CG-BP algorithms based on MSE, especially in nonGaussian environments and in cases with impulsive noise or heavy-tailed distributions noise. In addition, a convergence analysis of this new type of method is particularly considered. Numerical results for several samples of function approximation, synthetic function estimation, and chaotic time series prediction illustrate that our new BP method is more robust than the MSE-based method in the sense of impulsive noise, especially when SNR is low.

[1]  Stephen J. Wright,et al.  Sequential Quadratic Programming , 1999 .

[2]  Alan F. Murray,et al.  Confidence estimation methods for neural networks : a practical comparison , 2001, ESANN.

[3]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[4]  Deniz Erdogmus,et al.  A minimum-error entropy criterion with self-adjusting step-size (MEE-SAS) , 2007, Signal Process..

[5]  Ran He,et al.  Two-Stage Nonnegative Sparse Representation for Large-Scale Face Recognition , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[6]  V. Miranda,et al.  Entropy and Correntropy Against Minimum Square Error in Offline and Online Three-Day Ahead Wind Power Forecasting , 2009, IEEE Transactions on Power Systems.

[7]  Siam Rfview,et al.  CONVERGENCE CONDITIONS FOR ASCENT METHODS , 2016 .

[8]  Bernard Widrow,et al.  The least mean fourth (LMF) adaptive algorithm and its family , 1984, IEEE Trans. Inf. Theory.

[9]  Carlos Martins-Filho,et al.  Bias reduction in kernel density estimation via Lipschitz condition , 2009 .

[10]  Ya-Xiang Yuan,et al.  A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property , 1999, SIAM J. Optim..

[11]  Shuicheng Yan,et al.  Correntropy based feature selection using binary projection , 2011, Pattern Recognit..

[12]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[13]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[14]  Deniz Erdoğmuş INFORMATION THEORETIC LEARNING: RENYI'S ENTROPY AND ITS APPLICATIONS TO ADAPTIVE SYSTEM TRAINING , 2002 .

[15]  Jorge Nocedal,et al.  Global Convergence Properties of Conjugate Gradient Methods for Optimization , 1992, SIAM J. Optim..

[16]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[17]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[18]  Boris Polyak The conjugate gradient method in extremal problems , 1969 .

[19]  William W. Hager,et al.  A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search , 2005, SIAM J. Optim..

[20]  Norhamreeza Abdul Hamid,et al.  IMPROVEMENTS OF BACK PROPAGATION ALGORITHM PERFORMANCE BYADAPTIVELY CHANGING GAIN, MOMENTUM AND LEARNING RATE , 2011 .

[21]  C. Storey,et al.  Efficient generalized conjugate gradient algorithms, part 1: Theory , 1991 .

[22]  J. Heinonen Lectures on Lipschitz analysis , 2005 .

[23]  W. Marsden I and J , 2012 .

[24]  Deniz Erdogmus,et al.  An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[25]  M. Al-Baali Descent Property and Global Convergence of the Fletcher—Reeves Method with Inexact Line Search , 1985 .

[26]  J. Príncipe,et al.  A family of minimum renyi's error entropy algorithm for information processing , 2007 .

[27]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[28]  W. Hager,et al.  A SURVEY OF NONLINEAR CONJUGATE GRADIENT METHODS , 2005 .

[29]  Deniz Erdogmus,et al.  Generalized information potential criterion for adaptive system training , 2002, IEEE Trans. Neural Networks.

[30]  C. Charalambous,et al.  Conjugate gradient algorithm for efficient training of artifi-cial neural networks , 1990 .

[31]  Ehsan Lotfi,et al.  A winner-take-all approach to emotional neural networks with universal approximation property , 2015, Inf. Sci..

[32]  Bo Li,et al.  Information Theoretic Subspace Clustering , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[34]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[35]  Yin Hongxia CONVERGENCE PROPERTIES OF CONJUGATE GRADIENT METHODS WITH STRONG WOLFE LINESEARCH , 1998 .

[36]  Ya-Xiang Yuan,et al.  An Efficient Hybrid Conjugate Gradient Method for Unconstrained Optimization , 2001, Ann. Oper. Res..

[37]  Mohammad Bagher Tavakoli,et al.  Modified Levenberg-Marquardt Method for Neural Networks Training , 2007 .

[38]  José Carlos Príncipe,et al.  Advanced search algorithms for information-theoretic learning with kernel-based estimators , 2004, IEEE Transactions on Neural Networks.

[39]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[40]  Liming Shi,et al.  Convex Combination of Adaptive Filters under the Maximum Correntropy Criterion in Impulsive Interference , 2014, IEEE Signal Processing Letters.

[41]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[42]  Ran He,et al.  Maximum Correntropy Criterion for Robust Face Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Nanning Zheng,et al.  Generalized Correntropy for Robust Adaptive Filtering , 2015, IEEE Transactions on Signal Processing.

[44]  Jim Jing-Yan Wang,et al.  Regularized maximum correntropy machine , 2015, Neurocomputing.

[45]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[46]  Badong Chen,et al.  On the Smoothed Minimum Error Entropy Criterion , 2012, Entropy.

[47]  M. Powell Nonconvex minimization calculations and the conjugate gradient method , 1984 .

[48]  Ya-Xiang Yuan,et al.  Optimization Theory and Methods: Nonlinear Programming , 2010 .

[49]  A. Constantinides,et al.  Least mean mixed-norm adaptive filtering , 1994 .

[50]  M. Hénon A two-dimensional mapping with a strange attractor , 1976 .

[51]  Khashayar Khorasani,et al.  New training strategies for constructive neural networks with application to regression problems , 2004, Neural Networks.

[52]  E. Polak,et al.  Note sur la convergence de méthodes de directions conjuguées , 1969 .

[53]  Liu Guanghui,et al.  Global convergence of the fletcher-reeves algorithm with inexact linesearch , 1995 .

[54]  Ya-Xiang Yuan,et al.  Convergence Properties of Nonlinear Conjugate Gradient Methods , 1999, SIAM J. Optim..

[55]  Marc André Armand,et al.  Detecting OFDM Signals in Alpha-Stable Noise , 2014, IEEE Transactions on Communications.

[56]  P. Wolfe Convergence Conditions for Ascent Methods. II , 1969 .

[57]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .