Chapter 5 – Deep Architectures

This chapter covers foundations on feedforward neural networks and incorporates some developments on deep learning, which has become a central topic in machine learning. From the foundational side, the chapter deals with topics in computational geometry, circuit theory, circuit complexity, approximation theory, optimization theory, and statistics. An accurate analysis is given to appreciate the fundamental representational improvements connected with deep architectures. In particular, the different role of the nonlinear activation functions (including the rectifier) is discussed for the purposes of representation and learning. A section is devoted to convolutional networks that are proposed in a novel framework, where the emphasis is on the extraction of invariant features. The chapter presents gradient descent based learning along with the backpropagation algorithm. It is proposed for the general case of directed acyclic graphs, along with its distinguishing optimality property. Finally, a number of optimization issues are covered, including premature saturation and local minima.

[1]  Richard E. Korf,et al.  Depth-First Iterative-Deepening: An Optimal Admissible Tree Search , 1985, Artif. Intell..

[2]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[3]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part II: Designing the Classifier , 2008, IEEE Transactions on Neural Networks.

[4]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5]  Marco Gori,et al.  Optimal learning in artificial neural networks: A review of theoretical results , 1996, Neurocomputing.

[6]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[7]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[8]  Marco Gori,et al.  Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[9]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[10]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[11]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[12]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[13]  Paolo Frasconi,et al.  Learning without local minima in radial basis function networks , 1995, IEEE Trans. Neural Networks.

[14]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[15]  Sang-Hoon Oh,et al.  An analysis of premature saturation in back propagation learning , 1993, Neural Networks.

[16]  Claude E. Shannon,et al.  The synthesis of two-terminal switching circuits , 1949, Bell Syst. Tech. J..

[17]  Kenji Fukumizu,et al.  Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[18]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  Franco Scarselli,et al.  Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[22]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[23]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[24]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[25]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[26]  Marco Gori,et al.  Semantic video labeling by developmental visual agents , 2016, Comput. Vis. Image Underst..

[27]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[28]  Marco Gori,et al.  Exact and approximate graph matching using random walks , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[30]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[31]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[32]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[33]  Marios M. Polycarpou,et al.  High-order neural network structures for identification of dynamical systems , 1995, IEEE Trans. Neural Networks.

[34]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Giovanni Soda,et al.  Artificial neural networks for document analysis and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[37]  Ah Chung Tsoi,et al.  On the closure of the set of functions that can be realized by a given multilayer perceptron , 1998, IEEE Trans. Neural Networks.

[38]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part I: Detecting Nonstationary Changes , 2008, IEEE Transactions on Neural Networks.

[39]  Salvatore Gaglio,et al.  A Cognitive Architecture for Artificial Vision , 1997, Artif. Intell..

[40]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[41]  Paolo Frasconi,et al.  Optimal learning in artificial neural networks: A theoretical view , 1998 .

[42]  Leonard G. C. Hamey Comments on "Can backpropagation error surface not have local minima?" , 1994, IEEE Trans. Neural Networks.

[43]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[44]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[45]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[46]  X H Yu,et al.  On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.