Learning architectures with enhanced capabilities and easier training

Although discovery of the Error Back Propagation (EBP) learning algorithm was a real breakthrough, this is not only a very slow algorithm, but it also is not capable of training networks with super compact architecture. The most noticeable progress was done with an adaptation of the LM algorithm to neural network training. The LM algorithm is capable of training networks with 100 to 1000 fewer iterations, but the size of the problems are significantly limited. Also, the LM algorithm was adopted primarily for traditional MLP architectures. More recently two new revolutionary concepts were developed: Support Vector Machine and Extreme Learning Machines. They are very fast, but they train only shallow networks with one hidden layer. It was shown that these shallow networks have very limited capabilities. It has already demonstrated much higher capabilities of super compact architectures having 10 to 100 times more processing power than commonly used learning architectures For example, such a shallow MLP architecture with 10 neurons can solve only a Parity-9 problem, but a special deep FCC (Fully Connected Cascade) architecture with the same 10 neurons can solve as large a problem as a Parity-1023. Unfortunately, with the vanishing gradient problem), deep architectures are very difficult to train. By introducing additional connections across layers it was possible to efficiently train deep networks using the powerful NBN algorithm. Our early results show that there is a solution for this difficult problem.

[1]  Hao Yu,et al.  Fast and Efficient Second-Order Method for Training Radial Basis Function Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Guang-Bin Huang,et al.  Convex incremental extreme learning machine , 2007, Neurocomputing.

[3]  B.M. Wilamowski,et al.  Neural network architectures and learning algorithms , 2009, IEEE Industrial Electronics Magazine.

[4]  Hao Yu,et al.  Improved Computation for Levenberg–Marquardt Training , 2010, IEEE Transactions on Neural Networks.

[5]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[6]  Pawel Strumillo,et al.  Kernel orthonormalization in radial basis function neural networks , 1997, IEEE Trans. Neural Networks.

[7]  Hao Yu,et al.  An Incremental Design of Radial Basis Function Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Bogdan M. Wilamowski,et al.  PolyNet: A Polynomial-Based Learning Machine for Universal Approximation , 2015, IEEE Transactions on Industrial Informatics.

[9]  Bogdan M. Wilamowski,et al.  Solving parity-N problems with feedforward neural networks , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[10]  P. Rozycki,et al.  Dedicated deep neural network architectures and methods for their training , 2015, 2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES).

[11]  Hao Yu,et al.  Selection of Proper Neural Network Sizes and Architectures—A Comparative Study , 2012, IEEE Transactions on Industrial Informatics.

[12]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[13]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Janusz Kolbusz,et al.  Using Parity-N Problems as a Way to Compare Abilities of Shallow, Very Shallow and Very Deep Architectures , 2015, ICAISC.

[15]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[16]  Hao Yu,et al.  Neural Network Learning Without Backpropagation , 2010, IEEE Transactions on Neural Networks.

[17]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18]  Lei Chen,et al.  Enhanced random search based incremental extreme learning machine , 2008, Neurocomputing.

[19]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[20]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[21]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[22]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[23]  José R. Álvarez Injecting Knowledge into the Solution of the Two-Spiral Problem , 1999, Neural Computing & Applications.

[24]  Okyay Kaynak,et al.  Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks , 2008, IEEE Transactions on Industrial Electronics.

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[27]  Bogdan Wilamowski,et al.  Fully Connected Cascade Artificial Neural Network Architecture for Attention Deficit Hyperactivity Disorder Classification From Functional Magnetic Resonance Imaging Data , 2015, IEEE Transactions on Cybernetics.

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[30]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[31]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[32]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[33]  Bogdan M. Wilamowski,et al.  A Hybrid Constructive Algorithm for Single-Layer Feedforward Networks Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Howard B. Demuth,et al.  Neutral network toolbox for use with Matlab , 1995 .

[35]  George W. Irwin,et al.  A New Jacobian Matrix for Optimal Learning of Single-Layer Neural Networks , 2008, IEEE Transactions on Neural Networks.