论文信息 - Applying Ockham's Razor to back-propagation

Applying Ockham's Razor to back-propagation

Back-propagation learning (bp) is known for its serious limitations in generalising knowledge from certain types of learning material. In this paper we describe a new learning algorithm, bp-som, which overcomes some of these limitations as is shown by application to four benchmark tasks. bp-som is a combination of a multi-layered feedforward network (mfn) trained with bp, and Kohonen's self-organising maps (soms). During the learning process, hidden-unit activations of the mfn are presented as learning vectors to self-organising maps trained in parallel. Information on classi cation errors contained in the maps is used when updating the connection weights of the network in addition to standard error-back-propagation. The additional update leads to an increasing similarity of hidden-unit activation patterns associated with the same output class. In a number of experiments, bp-som is shown (i) to improve generalisation performance over that of bp for tasks with which bp exhibits over tting; (ii) to provide indications of the number of hidden layers needed to learn a task; (iii) to increase the amount of hidden units that can be pruned without loss of generalisation, and (iv) to provide a means for automatic knowledge abstraction from trained networks. From these experiments, we conclude that the hybrid bp-som architecture and learning algorithm combines advantages of the bp and som learning algorithms.

[1] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[2] Robert Hecht-Nielsen,et al. Applications of counterpropagation networks , 1988, Neural Networks.

[3] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[4] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[5] Sholom M. Weiss,et al. Computer Systems That Learn , 1990 .

[6] Michael C. Mozer,et al. Using Relevance to Reduce Network Size Automatically , 1989 .

[7] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[8] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[9] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .

[10] J. Wolfowitz,et al. An Introduction to the Theory of Statistics , 1951, Nature.

[11] Timothy Masters,et al. Multilayer Feedforward Networks , 1993 .

[12] Lutz Prechelt,et al. A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[13] Ah-Hwee Tan,et al. Rule Extraction: From Neural Architecture to Symbolic Representation , 1995 .

[14] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[15] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[16] D. Norris. How to built a connectionist idiot (savant) , 1990, Cognition.

[17] J. Wolfowitz,et al. Introduction to the Theory of Statistics. , 1951 .