4. Contextual Processing Using a Lexicon

A word recognition system has been developed at NIST to read free-formatted text paragraphs containing handprinted characters. The system has been developed and tested using samples of handprint from NIST Special Database 1 . This database of binary images contains 2,100 different writers’ printings of the Preamble to the U. S. Constitution. Each writer was asked to print these sentences in an empty 70mm by 175mm box. The Constitution box contains no guidelines for the placement and spacing of the handprinted text, nor are there guidelines to instruct the writer where to stop printing one line and to begin the next. While the layout of the handprint in these paragraphs is unconstrained, a limited-size lexicon may be applied to reduce the complexity of the recognition application. The Preamble contains 38 unique words comprised of 35 unique upper-case and lower-case letters, ignoring punctuation marks. The system is divided into four general components. 1) The Constitution box is located within a full-page image, and the handprint within the box is isolated. 2) The subimage containing the handprinted text is segmented using connected component labeling, and the resulting blobs are sorted into correct reading order. 3) The segmented blobs are classified using feature-based neural network recognition. 4) Words are parsed from the line-ordered classifications using the lexicon to locate word boundaries and to correct classification and segmentation errors. These components have been combined into an end-to-end hybrid system that executes across a UNIX file server and a massively parallel SIMD computer. The recognition system achieves a word error rate of 49% across all 2,100 printings of the Preamble (109,096 words). This performance is achieved with a neural network character classifier that has a substitution error rate of 14% on its 22,823 training patterns. This demonstrates the power of using a limited-size lexicon to parse words from a less than optimal character classifier. This paper discusses the word recognition system in detail.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  D. Henderson,et al.  An application of neural net chips: handwritten digit recognition , 1988, IEEE 1988 International Conference on Neural Networks.

[3]  James A. Pittman,et al.  Recognizing Hand-Printed Letters and Digits , 1989, NIPS.

[4]  William H. Press,et al.  Numerical recipes : the art of scientific computing : FORTRAN version , 1989 .

[5]  Michael D. Garris,et al.  Self-organizing neural network character recognition on a massively parallel computer , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[6]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[7]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[8]  P. M. Flanders,et al.  Efficient high-level programming on the AMT DAP , 1991 .

[9]  Kunihiko Fukushima,et al.  Character recognition with selective attention , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[10]  Michael D. Garris,et al.  NIST Scoring Package User's Guide Release 1.0 | NIST , 1992 .

[11]  Michael D. Garris Design and Collection of a Handwriting Sample Image Database , 1992 .

[12]  Patrick J. Grother,et al.  Massively parallel implementation of character recognition systems , 1992, Electronic Imaging.

[13]  Patrick J. Grother,et al.  Karhunen Loève feature extraction for neural handwritten character recognition , 1992, Defense, Security, and Sensing.

[14]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[15]  Michael D. Garris,et al.  NIST Scoring Package Cross-Reference for use with NIST Internal Reports 4950 and 5129 | NIST , 1993 .

[16]  C. L. Wilson,et al.  Evaluation of character recognition systems , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[17]  Michael D. Garris,et al.  Methods for Evaluating the Performance of Systems Intended to Recognize Characters from Image Data S , 1993 .

[18]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[19]  Michael D. Garris,et al.  Machine-assisted human classification of segmented characters for OCR testing and training , 1992, Electronic Imaging.

[20]  Rama Chellappa,et al.  Evaluation of pattern classifiers for fingerprint and OCR applications , 1994, Pattern Recognit..