Gradient-based learning applied to document recognition

Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

[1]  D. Whitteridge,et al.  Learning and Relearning , 1959, Science's STKE.

[2]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[3]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[4]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[5]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  I︠a︡. Z. T︠S︡ypkin,et al.  Foundations of the theory of learning systems , 1973 .

[8]  Kumpati S. Narendra,et al.  Adaptation and learning in automatic systems , 1974 .

[9]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Schurmann A Multifont Word Recognition System for Postal Address Reading , 1978, IEEE Transactions on Computers.

[11]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[12]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[13]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[14]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[17]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[18]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[19]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[20]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[21]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  William H. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[23]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[24]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987 .

[25]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[26]  Yann Le Cun,et al.  A Theoretical Framework for Back-Propagation , 1988 .

[27]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[28]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[29]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[30]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[31]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[32]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[33]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[34]  Y. Le Cun,et al.  Comparing different neural network architectures for classifying handwritten digits , 1989, International 1989 Joint Conference on Neural Networks.

[35]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[36]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[37]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[38]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[39]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[40]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[41]  Isabelle Guyon,et al.  Handwritten Digit Recognition: Applications of Neural Net Chips and Automatic Learning , 1989, NATO Neurocomputing.

[42]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[43]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[44]  Françoise Fogelman-Soulié,et al.  Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[45]  James D. Keeler,et al.  Integrated Segmentation and Recognition of Hand-Printed Numerals , 1990, NIPS.

[46]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[48]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[49]  A. Lucier,et al.  Overview and Synthesis , 1990 .

[50]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[52]  Michael C. Mozer,et al.  Perception of multiple objects - a connectionist approach , 1991, Neural network modeling and connectionism.

[53]  Alexander H. Waibel,et al.  Time-delay neural networks embedding time alignment: a performance analysis , 1991, EUROSPEECH.

[54]  Alexander H. Waibel,et al.  Multi-State Time Delay Networks for Continuous Speech Recognition , 1991, NIPS.

[55]  R. Kompe,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[56]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[57]  Yoshua Bengio,et al.  Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation , 1991, NIPS.

[58]  P. Haffner,et al.  Multi-State Time Delay Neural Networks for Continuous Speech Recognition , 1991 .

[59]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[60]  Yuchun Lee,et al.  Handwritten Digit Recognition Using K Nearest-Neighbor, Radial-Basis Function, and Backpropagation Neural Networks , 1991, Neural Computation.

[61]  Isabelle Guyon,et al.  Design of a neural network character recognizer for a touch terminal , 1991, Pattern Recognit..

[62]  Sargur N. Srihari,et al.  High-performance reading machines , 1992 .

[63]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[64]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[65]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[66]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[67]  Y. Le Cun,et al.  Shortest path segmentation: a method for training a neural network to recognize character strings , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[68]  Lawrence D. Jackel,et al.  Reading handwritten digits: a ZIP code recognition system , 1992, Computer.

[69]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[70]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[71]  Lawrence D. Jackel,et al.  Application of the ANNA neural network chip to high-speed character recognition , 1992, IEEE Trans. Neural Networks.

[72]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[73]  Isabelle Guyon,et al.  Recognition-Based Segmentation of On-Line Hand-Printed Words , 1992, NIPS.

[74]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[75]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[76]  Yann LeCun,et al.  Off Line Recognition of Handwritten Postal Words Using Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[77]  John C. Platt,et al.  Postal Address Block Location Using a Convolutional Locator Network , 1993, NIPS.

[78]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[79]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[80]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[81]  Patrick Haffner,et al.  Connectionist speech recognition with a global MMI algorithm , 1993, EUROSPEECH.

[82]  J. Wang,et al.  Multiresolution neural networks for omnifont character recognition , 1993, IEEE International Conference on Neural Networks.

[83]  Ulrich Bodenhausen,et al.  Connectionist architectural learning for high performance character and speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[84]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[85]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[86]  Yoshua Bengio,et al.  On-line handwriting recognition with neural networks: Spatial representation versus temporal representation , 1993 .

[87]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[88]  Gale L. Martin,et al.  Centered-Object Integrated Segmentation and Recognition of Overlapping Handprinted Characters , 1993, Neural Computation.

[89]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[90]  Ulrich Bodenhausen,et al.  A connectionist recognizer for on-line cursive handwriting recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[91]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[92]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[93]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[94]  Yoshua Bengio,et al.  Word normalization for on-line handwritten word recognition , 1994 .

[95]  Thomas M. Breuel,et al.  A system for the off-line recognition of handwritten text , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[96]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[97]  Christopher J. C. Burges,et al.  Image Segmentation and Recognition , 1994 .

[98]  Xavier L. Aubert,et al.  Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[99]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[100]  Yoshua Bengio,et al.  Word normalization for online handwritten word recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[101]  Anton Gunzinger,et al.  Fast neural net simulation with a DSP processor array , 1995, IEEE Trans. Neural Networks.

[102]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[103]  Mohamed Cheriet,et al.  Automatic processing of information on cheques , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[104]  David Saad,et al.  Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.

[105]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[106]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[107]  Ching Y. Suen,et al.  Cursive script recognition applied to the processing of bank cheques , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[108]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[109]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[110]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[111]  Samy Bengio,et al.  An EM Algorithm for Asynchronous Input/Output Hidden Markov Models , 1996 .

[112]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[113]  Takeo Kanade,et al.  Neural network-based face detection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[114]  Yoshua Bengio,et al.  Discriminative feature and model design for automatic speech recognition , 1997, EUROSPEECH.

[115]  Mehryar Mohri,et al.  A Rational Design for a Weighted Finite-State Transducer Library , 1997, Workshop on Implementing Automata.

[116]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[117]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[118]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[119]  Markus Voelter,et al.  State of the Art , 1997, Pediatric Research.

[120]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[121]  Isabelle Guyon,et al.  OVERVIEW AND SYNTHESIS OF ON-LINE CURSIVE HANDWRITING RECOGNITION TECHNIQUES , 1997 .

[122]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[123]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[124]  Ching,et al.  The State of the Art in On-Line Handwriting Recognition , 2000 .

[125]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.