Comparisons of Single- and Multiple-Hidden-Layer Neural Networks

In this study we conduct fair and systematic comparisons of two types of neural networks: single- and multiple-hidden-layer networks. For fair comparisons, we ensure that the two types use the same activation and output functions and have the same numbers of nodes, feedforward connections, and parameters. The networks are trained by the gradient descent algorithm to approximate linear and quadratic functions, and we examine their convergence properties. We show that, in both linear and quadratic cases, the learning rate is more flexible for networks with a single hidden layer than for those with multiple hidden layers. We also show that single-hidden-layer networks converge faster to linear target functions compared to multiple-hidden-layer networks.

[1]  Tom Heskes,et al.  A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning , 1996, IEEE Trans. Neural Networks.

[2]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1993 .

[3]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[4]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[5]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[6]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[7]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[8]  Eduardo D. Sontag,et al.  Feedback Stabilization Using Two-Hidden-Layer Nets , 1991, 1991 American Control Conference.

[9]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[10]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[13]  Takéhiko Nakama,et al.  Theoretical analysis of batch and on-line training for gradient descent learning in neural networks , 2009, Neurocomputing.

[14]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[15]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.