On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures

Recently, researchers in the artificial neural network field have focused their attention on connectionist models composed by several hidden layers. In fact, experimental results and heuristic considerations suggest that deep architectures are more suitable than shallow ones for modern applications, facing very complex problems, e.g., vision and human language understanding. However, the actual theoretical results supporting such a claim are still few and incomplete. In this paper, we propose a new approach to study how the depth of feedforward neural networks impacts on their ability in implementing high complexity functions. First, a new measure based on topological concepts is introduced, aimed at evaluating the complexity of the function implemented by a neural network, used for classification purposes. Then, deep and shallow neural architectures with common sigmoidal activation functions are compared, by deriving upper and lower bounds on their complexity, and studying how the complexity depends on the number of hidden units and the used activation function. The obtained results seem to support the idea that deep networks actually implements functions of higher complexity, so that they are able, with the same number of resources, to address more difficult problems.

[1]  J. Milnor On the Betti numbers of real varieties , 1964 .

[2]  A. Yao Separating the polynomial-time hierarchy by oracles , 1985 .

[3]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[4]  Ingo Wegener,et al.  The complexity of Boolean functions , 1987 .

[5]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[6]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[9]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[10]  M. Nakahara Geometry, Topology and Physics , 2018 .

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[13]  Hong-Te Su,et al.  Identification of Chemical Processes using Recurrent Networks , 1991, 1991 American Control Conference.

[14]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[15]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[16]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[17]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[18]  Marek Karpinski,et al.  Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks , 1997, J. Comput. Syst. Sci..

[19]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[20]  Eduardo Sontag VC dimension of neural networks , 1998 .

[21]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[22]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[23]  D. Mumford,et al.  The role of the primary visual cortex in higher level vision , 1998, Vision Research.

[24]  T. Zell Betti numbers of semi-Pfaffian sets , 1999 .

[25]  Barbara Hammer,et al.  Approximation capabilities of folding networks , 1999, ESANN.

[26]  Sandiway Fong,et al.  Natural Language Grammatical Inference with Recurrent Neural Networks , 2000, IEEE Trans. Knowl. Data Eng..

[27]  Hon-Kwok Fung,et al.  Minimal Feedforward Parity Networks Using Threshold Gates , 2001, Neural Computation.

[28]  Peter L. Bartlett,et al.  Vapnik-Chervonenkis dimension of neural nets , 2003 .

[29]  T. Zell Quantitative study of semi -Pfaffian sets , 2004, math/0401079.

[30]  坂上 貴之 書評 Computational Homology , 2005 .

[31]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[32]  Franco Scarselli,et al.  Recursive neural networks for processing graphs with labelled edges: theory and applications , 2005, Neural Networks.

[33]  Abubakr Muhammad,et al.  Coverage and hole-detection in sensor networks via homology , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[34]  Franco Scarselli,et al.  Recursive neural networks learn to localize faces , 2005, Pattern Recognit. Lett..

[35]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[36]  Franco Scarselli,et al.  Investigation into the application of graph neural networks to large-scale recommender systems , 2006 .

[37]  Ah Chung Tsoi,et al.  Computing customized page ranks , 2006, TOIT.

[38]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[39]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[40]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[41]  Ah Chung Tsoi,et al.  Computational Capabilities of Graph Neural Networks , 2009, IEEE Transactions on Neural Networks.

[42]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[43]  Franco Scarselli,et al.  Learning long-term dependencies using layered graph neural networks , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[44]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[45]  Edmondo Trentin,et al.  Supervised and Unsupervised Co-training of Adaptive Activation Functions in Neural Nets , 2011, PSL.

[46]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[47]  List Price,et al.  Real solutions to equations from geometry , 2013 .

[48]  R. Ho Algebraic Topology , 2022 .