Pairwise Gaussian Loss for Convolutional Neural Networks

Convolutional neural networks (CNNs) have demonstrated great competence in feature representation, and then, achieved a good performance to many classification tasks. Cross-entropy loss, together with softmax, is arguably one of the most commonly used loss functions in CNNs (that is generally called softmax loss). However, the softmax loss can result in a weakly discriminative feature representation since it focuses on the interclass separability rather than the intraclass compactness. This article proposes a pairwise Gaussian loss (PGL) for CNNs that can well address the intraclass compactness through significantly penalizing those similar sample pairs with a relatively large distance. At the same time, PGL can still ensure a good interclass separability. Experiments show that PGL can guarantee that CNNs obtain a better classification performance compared to not only the softmax loss but also others often used in CNNs. Our experiments also show that PGL has a stable convergence for the stochastic gradient descent optimization method in CNNs and a good generalization ability for different structures of CNNs.

[1]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[4]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Fei Yin,et al.  Robust Classification with Convolutional Prototype Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[9]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[10]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[11]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[12]  Mianxiong Dong,et al.  Deep Learning for Smart Industry: Efficient Manufacture Inspection System With Fog Computing , 2018, IEEE Transactions on Industrial Informatics.

[13]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[14]  Rahat Iqbal,et al.  Fault Detection and Isolation in Industrial Processes Using Deep Learning Approaches , 2019, IEEE Transactions on Industrial Informatics.

[15]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[16]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[17]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[18]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[19]  Zixiang Xiong,et al.  Separability and Compactness Network for Image Recognition and Superresolution , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[21]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[22]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[23]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[24]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[25]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[28]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[29]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[30]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Shengcai Liao,et al.  Soft-Margin Softmax for Deep Classification , 2017, ICONIP.

[32]  Jinjun Chen,et al.  Detection of Malicious Code Variants Based on Deep Learning , 2018, IEEE Transactions on Industrial Informatics.

[33]  Jianmin Wang,et al.  Deep Cauchy Hashing for Hamming Space Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[36]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).