Two Strategies to Avoid Overfitting in Feedforward Networks

Abstract We present a new network topology to avoid overfitting in two-layered feedforward networks. We use two additional linear layers and principal component analysis to reduce the dimension of both inputs and internal representations and to transmit the essential information. Thereby neurons with small variance in the output are removed, which results in better generalization properties. Our network and learning rules can also be seen as a procedure to reduce the number of free parameters without using second order information of the error function. As a second strategy we derive a penalty term, which drives the network to keep the variances of the hidden layer outputs small. Experimental results show that thereby the transmitted information is limited, which reduces the noise and gives better generalization. The variances of the outputs of the hidden neurons are used again as a pruning criterion. © 1997 Elsevier Science Ltd. All Rights Reserved.

[1]  J. Friedman Multivariate adaptive regression splines , 1990 .

[2]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[3]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[4]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[5]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[6]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[7]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[8]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[9]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[10]  Gustavo Deco,et al.  Unsupervised Mutual Information Criterion for Elimination of Overtraining in Supervised Multilayer Networks , 1995, Neural Computation.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[13]  Ralph Linsker,et al.  Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network , 1992, Neural Computation.

[14]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[15]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[16]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[17]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[18]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[19]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.