Training error, generalization error and learning curves in neural learning
暂无分享,去创建一个
A neural network is trained by using a set of available examples to minimize the training error such that the network parameters fit the examples well. However, it is desired to minimize the generalization error to which no direct access is possible. There are discrepancies between the training error and the generalization error due to the statistical fluctuation of examples. The article focuses on this problem from the statistical point of view. When the number of training examples is large, we have a universal asymptotic evaluation on the discrepancies of the two errors. This can be used for model selection based on the information criterion. When the number of training examples is small, their discrepancies are big, causing a serious overfitting or overtraining problem. We analyze this phenomenon by using a simple model. It is surprising that the generalization error even increases as the number of examples increases in a certain range. This shows the adequacy of the minimum training error learning method. We evaluate various means of overcoming the overtraining such as cross validated early stopping of training, introduction of the regularization terms, model selection and others.
[1] Shun-ichi Amari,et al. Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.
[2] Shun-ichi Amari,et al. Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.
[3] Shun-ichi Amari,et al. A universal theorem on learning curves , 1993, Neural Networks.