Evaluation of convolutional neural networks for visual recognition

Convolutional neural networks provide an efficient method to constrain the complexity of feedforward neural networks by weight sharing and restriction to local connections. This network topology has been applied in particular to image classification when sophisticated preprocessing is to be avoided and raw images are to be classified directly. In this paper two variations of convolutional networks--neocognitron and a modification of neocognitron--are compared with classifiers based on fully connected feedforward layers (i.e., multilayer perceptron, nearest neighbor classifier, auto-encoding network) with respect to their visual recognition performance. Beside the original neocognitron a modification of the neocognitron is proposed which combines neurons from perceptron with the localized network structure of neocognitron. Instead of training convolutional networks by time-consuming error backpropagation, in this work a modular procedure is applied whereby layers are trained sequentially from the input to the output layer in order to recognize features of increasing complexity. For a quantitative experimental comparison with standard classifiers two very different recognition tasks have been chosen: handwritten digit recognition and face recognition. In the first example on handwritten digit recognition the generalization of convolutional networks is compared to fully connected networks. In several experiments the influence of variations of position, size, and orientation of digits is determined and the relation between training sample size and validation error is observed. In the second example recognition of human faces is investigated under constrained and variable conditions with respect to face orientation and illumination and the limitations of convolutional networks are discussed.

[1]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[2]  Nigel M. Allinson,et al.  A connectionist model of familiar face recognition , 1992 .

[3]  Masashi Tanigawa,et al.  Use of different thresholds in learning and recognition , 1996, Neurocomputing.

[4]  C. Neubauer,et al.  Intelligent X-ray inspection for quality control of solder joints , 1997 .

[5]  Kunihiko Fukushima,et al.  Recognition and segmentation of connected characters with selective attention , 1993, Neural Networks.

[6]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[7]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[8]  Garrison W. Cottrell,et al.  EMPATH: Face, Emotion, and Gender Recognition Using Holons , 1990, NIPS.

[9]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[12]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[13]  C. Neubauer,et al.  Improving X-ray inspection of printed circuit boards by integration of neural network classifiers , 1993, Proceedings of 15th IEEE/CHMT International Electronic Manufacturing Technology Symposium.

[14]  C. Neubauer,et al.  Segmentation of defects in textile fabric , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[15]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[16]  C. Neubauer Fast detection and classification of defects on treated metal surfaces using a backpropagation neural network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[17]  Takayuki Ito,et al.  Realization of a Neural Network Model Neocognitron on a Hypercube Parallel Computer , 1990, Int. J. High Speed Comput..

[18]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[19]  Emmanuel Viennet,et al.  Solving the Human Face Recognition Task using Neural Nets , 1992 .

[20]  Kunihiko Fukushima,et al.  Analysis of the process of visual pattern recognition by the neocognitron , 1989, Neural Networks.

[21]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[22]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.