Formal Derivation of Mesh Neural Networks with Their Forward-Only Gradient Propagation

This paper proposes the Mesh Neural Network (MNN), a novel architecture which allows neurons to be connected in any topology, to efficiently route information. In MNNs, information is propagated between neurons throughout a state transition function. State and error gradients are then directly computed from state updates without backward computation. The MNN architecture and the error propagation schema is formalized and derived in tensor algebra. The proposed computational model can fully supply a gradient descent process, and is suitable for very large scale NNs, due to its expressivity and training efficiency, with respect to NNs based on back-propagation and computational graphs.

[1]  Sergios Theodoridis Chapter 5 – Stochastic Gradient Descent: The LMS Algorithm and its Family , 2015 .

[2]  Tingwen Huang,et al.  Complex-Valued Feedforward Neural Networks Learning Without Backpropagation , 2017, ICONIP.

[3]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[4]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[5]  Sergios Theodoridis,et al.  Stochastic Gradient Descent , 2015 .

[6]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[7]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[8]  Mohsen Guizani,et al.  Deep Feature Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network , 2017, IEEE Transactions on Big Data.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  W. Bastiaan Kleijn,et al.  The HSIC Bottleneck: Deep Learning without Back-Propagation , 2019, AAAI.

[12]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[13]  Hao Yu,et al.  Neural Network Learning Without Backpropagation , 2010, IEEE Transactions on Neural Networks.

[14]  Hao Tang,et al.  On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Gigliola Vaglini,et al.  Using stigmergy as a computational memory in the design of recurrent neural networks , 2019, ICPRAM.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[20]  E. Anderson The Species Problem in Iris , 1936 .