On model selection and the disability of neural networks to decompose tasks

A neural network with fixed topology can be regarded as a parametrization of functions, which decides on the correlations between functional variations when parameters are adapted. We propose an analysis, based on the differential geometry, that allows one to calculate these correlations. In practise, this describes how one response is unlearned while another is trained. Concerning conventional feed-forward neural networks we find that they generically introduce strong correlations, are predisposed to forgetting and inappropriate for task decomposition. Perspectives to solve these problems are discussed.

[1]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[2]  H. Waelbroeck,et al.  Codon Bias and Mutability in HIV Sequences , 1997, Journal of Molecular Evolution.

[3]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[4]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[5]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[8]  Shun-Ichi Amari,et al.  Mathematical methods of neurocomputing , 1993 .

[9]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[10]  Seungjin Choi,et al.  Natural Gradient Learning for Spatio-Temporal Decorrelation: Recurrent Network , 2000 .

[11]  X. Yao Evolving Artificial Neural Networks , 1999 .

[12]  Marc Toussaint,et al.  A neural model for multi-expert architectures , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[13]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[16]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[17]  Dana Ron,et al.  An Experimental and Theoretical Comparison of Model Selection Methods , 1995, COLT '95.

[18]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[19]  Dale Schuurmans A New Metric-Based Approach to Model Selection , 1997, AAAI/IAAI.

[20]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.