Efficient and robust deep learning with Correntropy-induced loss function

Abstract Deep learning systems aim at using hierarchical models to learning high-level features from low-level features. The progress in deep learning is great in recent years. The robustness of the learning systems with deep architectures is however rarely studied and needs further investigation. In particular, the mean square error (MSE), a commonly used optimization cost function in deep learning, is rather sensitive to outliers (or impulsive noises). Robust methods are needed to improve the learning performance and immunize the harmful influences caused by outliers which are pervasive in real-world data. In this paper, we propose an efficient and robust deep learning model based on stacked auto-encoders and Correntropy-induced loss function (CLF), called CLF-based stacked auto-encoders (CSAE). CLF as a nonlinear measure of similarity is robust to outliers and can approximate different norms (from $$l_0$$l0 to $$l_2$$l2) of data. Essentially, CLF is an MSE in reproducing kernel Hilbert space. Different from conventional stacked auto-encoders, which use, in general, the MSE as the reconstruction loss and KL divergence as the sparsity penalty term, the reconstruction loss and sparsity penalty term in CSAE are both built with CLF. The fine-tuning procedure in CSAE is also based on CLF, which can further enhance the learning performance. The excellent and robust performance of the proposed model is confirmed by simulation experiments on MNIST benchmark dataset.

[1]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[4]  Aleix M. Martínez,et al.  Recognizing Imprecisely Localized, Partially Occluded, and Expression Variant Faces from a Single Sample per Class , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[6]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[7]  Weifeng Liu,et al.  Correntropy: A Localized Similarity Measure , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[10]  José Carlos Príncipe,et al.  Correntropy as a Novel Measure for Nonlinearity Tests , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11]  Sanja Fidler,et al.  Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[13]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[14]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[15]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[16]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[17]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[18]  José Carlos Príncipe,et al.  Compressed signal reconstruction using the correntropy induced metric , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[20]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[21]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[22]  Weifeng Liu,et al.  Kernel Adaptive Filtering , 2010 .

[23]  José Carlos Príncipe,et al.  A loss function for classification based on a robust similarity metric , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[24]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[25]  Ran He,et al.  A Regularized Correntropy Framework for Robust Pattern Recognition , 2011, Neural Computation.

[26]  Badong Chen,et al.  Kernel adaptive filtering with maximum correntropy criterion , 2011, The 2011 International Joint Conference on Neural Networks.

[27]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[28]  Badong Chen,et al.  Maximum Correntropy Estimation Is a Smoothed MAP Estimation , 2012, IEEE Signal Processing Letters.

[29]  Montavon,et al.  [Lecture Notes in Computer Science] Neural Networks: Tricks of the Trade Volume 7700 || Deep Learning via Semi-supervised Embedding , 2012 .

[30]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[31]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[32]  Noel Lopes,et al.  Extreme Learning Classifier with Deep Concepts , 2013, CIARP.

[33]  Zhaohui Wu,et al.  Robust feature learning by stacked autoencoder with maximum correntropy criterion , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Nanning Zheng,et al.  Steady-State Mean-Square Error Analysis for Adaptive Filtering under the Maximum Correntropy Criterion , 2014, IEEE Signal Processing Letters.

[35]  José Carlos Príncipe,et al.  The C-loss function for pattern classification , 2014, Pattern Recognit..

[36]  Jie Chen,et al.  Steady-State Performance of Non-Negative Least-Mean-Square Algorithm and Its Variants , 2014, IEEE Signal Processing Letters.

[37]  Ambedkar Dukkipati,et al.  To go deep or wide in learning? , 2014, AISTATS.

[38]  Fuzhen Zhuang,et al.  Learning deep representations via extreme learning machines , 2015, Neurocomputing.

[39]  Marcelo Mendoza,et al.  Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications , 2017, Lecture Notes in Computer Science.