$\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions

In neural networks, the loss function represents the core of the learning process that leads the optimizer to an approximation of the optimal convergence error. Convolutional neural networks (CNN) use the loss function as a supervisory signal to train a deep model and contribute significantly to achieving the state of the art in some fields of artificial vision. Cross-entropy and Center loss functions are commonly used to increase the discriminating power of learned functions and increase the generalization performance of the model. Center loss minimizes the class intra-class variance and at the same time penalizes the long distance between the deep features inside each class. However, the total error of the center loss will be heavily influenced by the majority of the instances and can lead to a freezing state in terms of intra-class variance. To address this, we introduce a new loss function called sigma squared reduction loss ( 2R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance and then continue to reduce the intra-class variance. Our loss has clear intuition and geometric interpretation, furthermore, we demonstrate by experiments the effectiveness of our proposal on several benchmark datasets showing the intra-class variance reduction and overcoming the results obtained with center loss and soft nearest neighbour functions.

[1]  Vishal M. Patel,et al.  Learning Deep Features for One-Class Classification , 2018, IEEE Transactions on Image Processing.

[2]  Geoffrey E. Hinton,et al.  Analyzing and Improving Representations with the Soft Nearest Neighbor Loss , 2019, ICML.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ignazio Gallo,et al.  Git Loss for Deep Face Recognition , 2018, BMVC.

[5]  Sanjay Chawla,et al.  Anomaly Detection using One-Class Neural Networks , 2018, ArXiv.

[6]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[8]  Ignazio Gallo,et al.  σ2R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions , 2020, ArXiv.

[9]  Fei Su,et al.  Contrastive-center loss for deep neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[10]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[11]  Ignazio Gallo,et al.  OCmst: One-class Novelty Detection using Convolutional Neural Network and Minimum Spanning Trees , 2020, Pattern Recognit. Lett..

[12]  Alexander Binder,et al.  Deep One-Class Classification , 2018, ICML.

[13]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Nan Jiang,et al.  Low-Rank Spectral Learning with Weighted Loss Functions , 2015, AISTATS.

[16]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[19]  Aidong Men,et al.  A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data , 2017, Comput. Intell. Neurosci..

[20]  Ignazio Gallo,et al.  Dynamic Decision Boundary for One-class Classifiers applied to non-uniformly Sampled Data , 2020, ArXiv.

[21]  Minho Lee,et al.  Deep learning with support vector data description , 2015, Neurocomputing.

[22]  Ya Li,et al.  Speech Emotion Recognition via Contrastive Loss under Siamese Networks , 2018, Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data.

[23]  Larry S. Davis,et al.  Understanding Center Loss Based Network for Image Retrieval with Few Training Data , 2018, ECCV Workshops.

[24]  Mei Wang,et al.  Deep Face Recognition: A Survey , 2018, Neurocomputing.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[30]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.