Parametric Rectified Power Sigmoid Units: Learning Nonlinear Neural Transfer Analytical Forms

The paper proposes representation functionals in a dual paradigm where learning jointly concerns both linear convolutional weights and parametric forms of nonlinear activation functions. The nonlinear forms proposed for performing the functional representation are associated with a new class of parametric neural transfer functions called rectified power sigmoid units. This class is constructed to integrate both advantages of sigmoid and rectified linear unit functions, in addition with rejecting the drawbacks of these functions. Moreover, the analytic form of this new neural class involves scale, shift and shape parameters so as to obtain a wide range of activation shapes, including the standard rectified linear unit as a limit case. Parameters of this neural transfer class are considered as learnable for the sake of discovering the complex shapes that can contribute in solving machine learning issues. Performance achieved by the joint learning of convolutional and rectified power sigmoid learnable parameters are shown outstanding in both shallow and deep learning frameworks. This class opens new prospects with respect to machine learning in the sense that learnable parameters are not only attached to linear transformations, but also to suitable nonlinear operators.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nanning Zheng,et al.  Training DCNN by Combining Max-Margin, Max-Correlation Objectives, and Correntropy Loss for Multilabel Image Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Abdourrahmane M. Atto,et al.  Smooth sigmoid wavelet shrinkage for non-parametric estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Olivier Alata,et al.  Non-stationary texture synthesis from random field modeling , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[7]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[8]  W. A. Little,et al.  Analytic study of the memory storage capacity of a neural network , 1978 .

[9]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[10]  Xuelong Li,et al.  Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning , 2019, ArXiv.

[11]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[12]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  Diganta Misra Mish: A Self Regularized Non-Monotonic Activation Function , 2020, BMVC.