A Unit Softmax with Laplacian Smoothing Stochastic Gradient Descent for Deep Convolutional Neural Networks

Several techniques were designed during last few years to improve the performance of deep architecture by means of appropriate loss functions or activation functions. Arguably, softmax is the traditionally convenient to train Deep Convolutional Neural Networks (DCNNs) for classification task. However, the modern deep learning architectures have exposed its limitation towards feature discriminability. In this paper, we offered a supervision signal for discriminative image features through a modification in softmax to boost up the power of loss function. Amending the original softmax loss and motivated by the A-softmax loss for face recognition, we fixed the angular margin to introduce a unit margin softmax loss. The improved alternative form of softmax is trainable, easy to optimize and stable for usage along with Stochastic Gradient Descent (SGD) and Laplacian Smoothing Stochastic Gradient Descent (LS-SGD) and applicable to classify the digits in image. Experimental results demonstrate a state-of-the-art performance on famous database of handwritten digits the Modified National Institute of Standards and Technology (MNIST) database.

[1]  Abdul Kawsar Tushar,et al.  Handwritten Arabic numeral recognition using deep learning neural networks , 2017, 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR).

[2]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[3]  Ludovic Leclercq,et al.  The Hamilton-Jacobi Partial Differential Equation and the Three Representations of Traffic Flow , 2013 .

[4]  Laurence T. Yang,et al.  A survey on deep learning for big data , 2018, Inf. Fusion.

[5]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Yann LeCun,et al.  Deep learning with Elastic Averaging SGD , 2014, NIPS.

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[10]  Stefano Soatto,et al.  Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.

[11]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[12]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[13]  Zhouwang Yang,et al.  Additive Parameter for Deep Face Recognition , 2020 .

[14]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[16]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[17]  Hiroshi Sako,et al.  Discriminative learning quadratic discriminant function for handwriting recognition , 2004, IEEE Transactions on Neural Networks.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[22]  Naveed Anjum,et al.  He–Laplace method for general nonlinear periodic solitary solution of vibration equations , 2018, Journal of Low Frequency Noise, Vibration and Active Control.

[23]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  L. Evans,et al.  Partial Differential Equations , 1941 .

[26]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[27]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[28]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[29]  Frédéric Jurie,et al.  Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks , 2018, ArXiv.

[30]  W. D. Evans,et al.  PARTIAL DIFFERENTIAL EQUATIONS , 1941 .

[31]  Dianchen Lu,et al.  HE–ELZAKI METHOD FOR SPATIAL DIFFUSION OF BIOLOGICAL POPULATION , 2019, Fractals.