Public domain optical character recognition

A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on handwriting sample forms like the ones distributed with NIST Special Database 1. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized probabilistic neural network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.

[1]  Patrick J. Grother,et al.  Karhunen Loève feature extraction for neural handwritten character recognition , 1992, Defense, Security, and Sensing.

[2]  Patrick J. Grother,et al.  Comparison of Handprinted Digit Classifiers , 1993 .

[3]  Michael D. Garris,et al.  Unconstrained handprint recognition using a limited lexicon , 1994, Electronic Imaging.

[4]  Patrick J. Grother,et al.  Massively parallel implementation of character recognition systems , 1992, Electronic Imaging.

[5]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[6]  Rama Chellappa,et al.  Evaluation of pattern classifiers for fingerprint and OCR applications , 1994, Pattern Recognit..

[7]  Jack J. Dongarra,et al.  Matrix Eigensystem Routines - EISPACK Guide, Second Edition , 1976, Lecture Notes in Computer Science.

[8]  Michael D. Garris,et al.  Evaluating Form Designs for Optical Character Recognition , 1994 .

[9]  B. S. Garbow,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[10]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[11]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[12]  Michael D. Garris,et al.  Methods for Evaluating the Performance of Systems Intended to Recognize Characters from Image Data S , 1993 .

[13]  William H. Press,et al.  Numerical recipes , 1990 .

[14]  Michael D. Garris,et al.  NIST Scoring Package Cross-Reference for use with NIST Internal Reports 4950 and 5129 | NIST , 1993 .

[15]  Michael D. Garris Design and Collection of a Handwriting Sample Image Database , 1992 .

[16]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[17]  Michael D. Garris,et al.  NIST Scoring Package User's Guide Release 1.0 | NIST , 1992 .

[18]  M. Garris NIST form-based handprint recognition system , 1994 .

[19]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[20]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[21]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[22]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .