This paper analyses the intrinsic relationship between the BP network learning ability and generalization ability and other influencing factors when the overfit occurs, and introduces the multiple correlation coefficient to describe the complexity of samples; it follows the calculation uncertainty principle and the minimum principle of neural network structural design, provides an analogy of the general uncertainty relation in the information transfer process, and ascertains the uncertainty relation between the training relative error of the training sample set, which reflects the network learning ability, and the test relative error of the test sample set, which represents the network generalization ability; through the simulation of BP network overfit numerical modeling test with different types of functions, it is ascertained that the overfit parameter q in the relation generally has a span of 7 × 10−3 to 7 × 10−2; the uncertainty relation then helps to obtain the formula for calculating the number of hidden nodes of a network with good generalization ability under the condition that multiple correlation coefficient is used to describe sample complexity and the given approximation error requirement is satisfied; the rationality of this formula is verified; this paper also points out that applying the BP network to the training process of the given sample set is the best method for stopping training that improves the generalization ability.
[1]
Pierre Baldi,et al.
Temporal Evolution of Generalization during Learning in Linear Networks
,
1991,
Neural Computation.
[2]
Klaus-Robert Müller,et al.
Asymptotic statistical theory of overtraining and cross-validation
,
1997,
IEEE Trans. Neural Networks.
[3]
A. Barron.
Approximation and Estimation Bounds for Artificial Neural Networks
,
1991,
COLT '91.
[4]
Malik Magdon-Ismail,et al.
No Free Lunch for Early Stopping
,
1999,
Neural Computation.
[5]
John E. Moody,et al.
The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
,
1991,
NIPS.
[6]
David Haussler,et al.
What Size Net Gives Valid Generalization?
,
1989,
Neural Computation.
[7]
Derek Partridge.
Network generalization differences quantified
,
1996,
Neural Networks.
[8]
Y Zha.
INFORMATION UNCERTAINTY PRINCIPLE
,
1989
.
[9]
Elie Bienenstock,et al.
Neural Networks and the Bias/Variance Dilemma
,
1992,
Neural Computation.