The Study of Architecture MLP with Linear Neurons in Order to Eliminate the "vanishing Gradient" Problem

Research in deep neural networks are becoming popular in artificial intelligence. Main reason for training difficulties is the problem of vanishing gradients while number of layers increases. While such networks are very powerful they are difficult in training. The paper discusses capabilities of different neural network architectures and presents the proposition of new multilayer architecture with additional linear neurons, that is much easier to train that traditional MLP network and reduces effect of vanishing gradients. Efficiency of suggested approach has been confirmed by several exeriments.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Hao Yu,et al.  Selection of Proper Neural Network Sizes and Architectures—A Comparative Study , 2012, IEEE Transactions on Industrial Informatics.

[3]  P. Rozycki,et al.  Estimation of deep neural networks capabilities based on a trigonometric approach , 2016, 2016 IEEE 20th Jubilee International Conference on Intelligent Engineering Systems (INES).

[4]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[10]  P. Rozycki,et al.  Dedicated deep neural network architectures and methods for their training , 2015, 2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  David Shawn Hunter,et al.  Utilizing Dual Neural Networks as a Tool for Training, Optimization, and Architecture Conversion , 2013 .

[13]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[15]  Janusz Kolbusz,et al.  Estimation of Deep Neural Networks Capabilities Using Polynomial Approach , 2016, ICAISC.

[16]  Hao Yu,et al.  Neural Network Learning Without Backpropagation , 2010, IEEE Transactions on Neural Networks.

[17]  Bo Wu,et al.  Big data and deep learning , 2016, 2016 IEEE 20th Jubilee International Conference on Intelligent Engineering Systems (INES).

[18]  Janusz Korniak,et al.  Learning architectures with enhanced capabilities and easier training , 2015, 2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES).

[19]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Hao Yu,et al.  Improved Computation for Levenberg–Marquardt Training , 2010, IEEE Transactions on Neural Networks.

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.