Learning and generalization in feed-forward neural networks

Aspects of learning and generalization in feed-forward neural networks are studied. The networks are taught using the backpropagation learning algorithm. The performance of the algorithm is studied using a training set which can be made to have a variable difficulty. Using such a training set the performance is evaluated and improvements and modifications suggested. A simple classification of problem domain types is made and a particular class is suggested to be the most appropriate for the 3—layer feed-forward network to learn. This class is characterized by underlying regularities among the training set members, such that the mapping required for each pattern in the training set is consistent with all the other required pattern mappings. The suitability of this class of training sets is demonstrated with observation of the emergent properties of the network in actual learning speed and nature, and in the generalization ability displayed after learning an incomplete training set. This behaviour is contrasted with training sets not possessing the underlying properties of this class, from which it is concluded that this type of network is more effectively used for extracting salient information about a training set, given that underlying regularities exist, rather than for other classes of mappings. The dependence of generalization of the network on such problem domains is studied as a function of hidden layer size. It is shown that in general the number of different solutions available in the algorithm's search space increases rapidly with the hidden layer size. Despite this, it is shown that the generalization performance does not degrade correspondingly, but in fact remains at a steady high level. This observation suggests that the salient information about a training set is more likely to be extracted during learning, as opposed to merely mapping the patterns independently (which form a large set of other possible solutions), and that this information is stored in a distributed manner throughout all the weights of the network.

[1]  John McCarthy,et al.  Programs with common sense , 1960 .

[2]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[3]  Victoria Y. Yoon,et al.  Desknet: The dermatology expert system with knowledge-based network , 1988, Neural Networks.

[4]  Michael G. Norman,et al.  Neural Network Applications in the Edinburgh Concurrent Supercomputer Project , 1989, NATO Neurocomputing.

[5]  B. M. Forrest Restoration on binary images using networks of analogue neurons , 1988 .

[6]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[7]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[8]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[9]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[10]  Peter Danielson Artificial Intelligence and Natural Man , 1982 .

[11]  Patrick Henry Winston,et al.  The Society Theory of Thinking , 1982 .

[12]  Terrence J. Sejnowski,et al.  A Parallel Network that Learns to Play Backgammon , 1989, Artif. Intell..

[13]  Russell Leighton,et al.  Shaping schedules as a method for accelerated learning , 1988, Neural Networks.

[14]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[17]  Oliver G. Selfridge,et al.  Pattern recognition by machine , 1960 .

[18]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[19]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[20]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Julian R. Ullmann,et al.  Pattern recognition techniques , 1973 .

[22]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[23]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[24]  C. Zheng,et al.  ; 0 ; , 1951 .

[25]  Alexander Linden,et al.  Detection of Minimal Microfeatures by Internal Feedback , 1989, ÖGAI.

[26]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[27]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[28]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[29]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[30]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[31]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[32]  Frank J. Smieja,et al.  Hard Learning the Easy Way: Backpropagation with Deformation , 1988, Complex Syst..

[33]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .

[34]  Jay M. Tenenbaum,et al.  A Region-Analysis Subsystem For Interactive Scene Analysis , 1975, IJCAI.

[35]  Gerald Jay Sussman,et al.  A Computer Model of Skill Acquisition , 1975 .