Self-Adaptive Layer: An Application of Function Approximation Theory to Enhance Convergence Efficiency in Neural Networks

Neural networks provide a general architecture to model complex nonlinear systems, but the source data are often mixed with a lot of noise and interference information. One way to offer a smoother alternative for addressing this issue in training is to increase the neural or layer size. In this paper, a new self-adaptive layer is developed to overcome the problems of neural networks so as to achieve faster convergence and avoid local minimum. We incorporate function approximation theory into the layer element arrangement, so that the training process and the network approximation properties can be investigated via linear algebra, where the precision of adaptation can be controlled by the order of polynomials being used. Experimental results show that our proposed layer leads to significantly faster performance in convergence. As a result, this new layer greatly enhances the training accuracy. Moreover, the design and implementation can be easily deployed in most current systems.

[1]  Tom Lyche,et al.  Theory and Algorithms for Non-Uniform Spline Wavelets , 2001 .

[2]  Ching-Shiow Tseng,et al.  An orthogonal neural network for function approximation , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jing Zhang,et al.  Adaptive Neural Network Finite-Time Output Feedback Control of Quantized Nonlinear Systems , 2018, IEEE Transactions on Cybernetics.

[5]  Zhi-Hua Zhou,et al.  FANNC: A Fast Adaptive Neural Network Classifier , 2000, Knowledge and Information Systems.

[6]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[9]  Robert F. Stengel,et al.  Smooth function approximation using neural networks , 2005, IEEE Transactions on Neural Networks.

[10]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  N. Wermuth,et al.  Nonlinear Time Series: Nonparametric and Parametric Methods , 2005 .

[12]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Marios M. Polycarpou,et al.  Stable adaptive neural control scheme for nonlinear systems , 1996, IEEE Trans. Autom. Control..

[15]  K. Lee,et al.  Function approximation with an orthogonal basis net , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[16]  Liang Jin,et al.  Direct adaptive output tracking control using multilayered neural networks , 1993 .

[17]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[19]  S. Sitharama Iyengar,et al.  Adaptive neural network clustering of Web users , 2004, Computer.

[20]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[21]  E. Quak,et al.  Multivariate Approximation and Applications: Theory and algorithms for nonuniform spline wavelets , 2001 .

[22]  Artem N. Chernodub,et al.  Norm-preserving Orthogonal Permutation Linear Unit Activation Functions (OPLU) , 2016, ArXiv.

[23]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[27]  Wei Ke,et al.  SinP[N]: A Fast Convergence Activation Function for Convolutional Neural Networks , 2018, 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion).