Neural Networks with Smooth Adaptive Activation Functions for Regression

In Neural Networks (NN), Adaptive Activation Functions (AAF) have parameters that control the shapes of activation functions. These parameters are trained along with other parameters in the NN. AAFs have improved performance of Neural Networks (NN) in multiple classification tasks. In this paper, we propose and apply AAFs on feedforward NNs for regression tasks. We argue that applying AAFs in the regression (second-to-last) layer of a NN can significantly decrease the bias of the regression NN. However, using existing AAFs may lead to overfitting. To address this problem, we propose a Smooth Adaptive Activation Function (SAAF) with piecewise polynomial form which can approximate any continuous function to arbitrary degree of error. NNs with SAAFs can avoid overfitting by simply regularizing the parameters. In particular, an NN with SAAFs is Lipschitz continuous given a bounded magnitude of the NN parameters. We prove an upper-bound for model complexity in terms of fat-shattering dimension for any Lipschitz continuous regression model. Thus, regularizing the parameters in NNs with SAAFs avoids overfitting. We empirically evaluated NNs with SAAFs and achieved state-of-the-art results on multiple regression datasets.

[1]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[2]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[3]  Mats G. Larson,et al.  The Finite Element , 2013 .

[4]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[5]  Dong-Sheng Jeng,et al.  Predictions of bridge scour: Application of a feed-forward neural network with an adaptive activation function , 2013, Eng. Appl. Artif. Intell..

[6]  Tal Hassner,et al.  Age and Gender Estimation of Unfiltered Faces , 2014, IEEE Transactions on Information Forensics and Security.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Sergio Escalera,et al.  ChaLearn Looking at People 2015: Apparent Age and Cultural Event Recognition Datasets and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Hiroomi Hikawa,et al.  A digital hardware pulse-mode neuron with piecewise linear activation function , 2003, IEEE Trans. Neural Networks.

[10]  Yilong Hao,et al.  The performance of the backpropagation algorithm with varying slope of the activation function , 2009 .

[11]  Francesco Piazza,et al.  Multilayer feedforward networks with adaptive spline activation function , 1999, IEEE Trans. Neural Networks.

[12]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Nassir Navab,et al.  Robust Optimization for Deep Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Nassir Navab,et al.  Holistic Human Pose Estimation with Regression Forests , 2014, AMDO.

[18]  Yu Gong,et al.  B-spline neural network based digital baseband predistorter solution using the inverse of De Boor algorithm , 2011, The 2011 International Joint Conference on Neural Networks.

[19]  Yunchao Wei,et al.  Deep Learning with S-Shaped Rectified Linear Activation Units , 2015, AAAI.

[20]  Paulo C. Rech,et al.  Hopfield neural network: The hyperbolic tangent and the piecewise-linear activation functions , 2012, Neural Networks.

[21]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Yousef Al-Kofahi,et al.  Improved Automatic Detection and Segmentation of Cell Nuclei in Histopathology Images , 2010, IEEE Transactions on Biomedical Engineering.

[25]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[26]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[27]  Tal Hassner,et al.  Age and gender classification using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  H. Braak,et al.  Staging of brain pathology related to sporadic Parkinson’s disease , 2003, Neurobiology of Aging.

[29]  Ming Zhang,et al.  Justification of a neuron-adaptive activation function , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[30]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[31]  Pierre Baldi,et al.  Learning Activation Functions to Improve Deep Neural Networks , 2014, ICLR.

[32]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[33]  Luc Van Gool,et al.  Some Like It Hot — Visual Guidance for Preference Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[35]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.