Applying Ockham's Razor to back-propagation

Back-propagation learning (bp) is known for its serious limitations in generalising knowledge from certain types of learning material. In this paper we describe a new learning algorithm, bp-som, which overcomes some of these limitations as is shown by application to four benchmark tasks. bp-som is a combination of a multi-layered feedforward network (mfn) trained with bp, and Kohonen's self-organising maps (soms). During the learning process, hidden-unit activations of the mfn are presented as learning vectors to self-organising maps trained in parallel. Information on classi cation errors contained in the maps is used when updating the connection weights of the network in addition to standard error-back-propagation. The additional update leads to an increasing similarity of hidden-unit activation patterns associated with the same output class. In a number of experiments, bp-som is shown (i) to improve generalisation performance over that of bp for tasks with which bp exhibits over tting; (ii) to provide indications of the number of hidden layers needed to learn a task; (iii) to increase the amount of hidden units that can be pruned without loss of generalisation, and (iv) to provide a means for automatic knowledge abstraction from trained networks. From these experiments, we conclude that the hybrid bp-som architecture and learning algorithm combines advantages of the bp and som learning algorithms.

[1]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[2]  Robert Hecht-Nielsen,et al.  Applications of counterpropagation networks , 1988, Neural Networks.

[3]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[4]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[5]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[6]  Michael C. Mozer,et al.  Using Relevance to Reduce Network Size Automatically , 1989 .

[7]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[10]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[11]  Timothy Masters,et al.  Multilayer Feedforward Networks , 1993 .

[12]  Lutz Prechelt,et al.  A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[13]  Ah-Hwee Tan,et al.  Rule Extraction: From Neural Architecture to Symbolic Representation , 1995 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[16]  D. Norris How to built a connectionist idiot (savant) , 1990, Cognition.

[17]  J. Wolfowitz,et al.  Introduction to the Theory of Statistics. , 1951 .