Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős–Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.Artificial neural networks are artificial intelligence computing methods which are inspired by biological neural networks. Here the authors propose a method to design neural networks as sparse scale-free networks, which leads to a reduction in computational time required for training and inference.

[1]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[2]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[3]  Decebal Constantin Mocanu,et al.  Decentralized dynamic understanding of hidden relations in complex networks , 2018, Scientific Reports.

[4]  Yunchao Wei,et al.  Deep Learning with S-Shaped Rectified Linear Activation Units , 2015, AAAI.

[5]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[6]  C. V. D. Malsburg,et al.  Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms , 1986 .

[7]  Giulio Tononi,et al.  Ultrastructural evidence for synaptic scaling across the wake/sleep cycle , 2017, Science.

[8]  J. Hell,et al.  Faculty of 1000 evaluation for Homer1a drives homeostatic scaling-down of excitatory synapses during sleep. , 2018 .

[9]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[10]  Ulrik Brandes,et al.  What is network science? , 2013, Network Science.

[11]  Decebal Constantin Mocanu,et al.  On the Synergy of Network Science and Artificial Intelligence , 2016, IJCAI.

[12]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[13]  O. Sporns,et al.  Complex brain networks: graph theoretical analysis of structural and functional systems , 2009, Nature Reviews Neuroscience.

[14]  Sylvain Chevallier,et al.  Artificial Neurogenesis: An Introduction and Selective Review , 2014, Growing Adaptive Machines.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[17]  L. Pessoa Understanding brain networks and brain organization. , 2014, Physics of life reviews.

[18]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[19]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[20]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[21]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[22]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[23]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[24]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[25]  Hod Lipson,et al.  Visually Debugging Restricted Boltzmann Machine Training with a 3D Example , 2012 .

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Takayuki Osogami,et al.  Restricted Boltzmann machines modeling human choice , 2014, NIPS.

[29]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[30]  Benjamin Schrauwen,et al.  Accelerating sparse restricted Boltzmann machine training using non-Gaussianity measures , 2012, NIPS 2012.

[31]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[32]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[33]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[34]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[35]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[36]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[37]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[38]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[39]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[40]  Thomas Miconi Neural networks with differentiable structure , 2016, ArXiv.

[41]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[42]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[43]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[44]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[45]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[46]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[47]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[48]  Pierre Baldi,et al.  Functional census of mutation sequence spaces: the example of p53 cancer rescue mutants , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[50]  Antonio Liotta,et al.  A topological insight into restricted Boltzmann machines , 2016, Machine Learning.

[51]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[52]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[53]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[54]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[55]  S. Strogatz Exploring complex networks , 2001, Nature.

[56]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[57]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  B. Everitt The Cambridge Dictionary of Statistics , 1998 .

[59]  Eirini Liotou,et al.  No-reference video quality measurement: added value of machine learning , 2015, J. Electronic Imaging.

[60]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[61]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[62]  Thilo Gross,et al.  All scale-free networks are sparse. , 2011, Physical review letters.

[63]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[64]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[65]  J. R. McDonnell,et al.  Evolving neural network connectivity , 1993, IEEE International Conference on Neural Networks.

[66]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[67]  Roland Memisevic,et al.  How far can we go without convolution: Improving fully-connected networks , 2015, ArXiv.

[68]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[69]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.