Using a partitioned dictionary for contextual post-processing of OCR-results

This paper describes an approach for the partitioning of large dictionaries which can be used in document analysis. It introduces a concept of virtual views on the dictionary. The architecture of the dictionary system is based on redundant hashing techniques. The system distinguishes between the two main modules: the dictionary generator and the dictionary controller. Our tests comparing the dictionary with standard UNIX utilities show that dictionary look-up is very fast.

[1]  R. Mahesh K. Sinha Some Characteristic Curves for Dictionary Organization with Digital Search , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Nobuyasu Itoh,et al.  A spelling correction method and its application to an OCR system , 1990, Pattern Recognit..

[3]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[4]  Schurmann A Multifont Word Recognition System for Postal Address Reading , 1978, IEEE Transactions on Computers.

[5]  T. N. Turba Checking for spelling and typographical errors in computer-based text , 1981, SIGPLAN SIGOA Symposium on Text Manipulation.

[6]  Dave Elliman,et al.  A review of segmentation and contextual analysis techniques for text recognition , 1990, Pattern Recognit..

[7]  Rainer Hoch Hybrid Structured Dictionary for Improving Text Recognition , 1992, MVA.

[8]  Wolfgang Doster,et al.  Contextual Postprocessing System for Cooperation with a Multiple-Choice Character-Recognition System , 1977, IEEE Transactions on Computers.

[9]  Philippe Trigano,et al.  Lexical architecture based on a hierarchy of codes for high-speed string correction , 1992, Defense, Security, and Sensing.

[10]  Rainer Hoch,et al.  From paper to office document standard representation , 1992, Computer.

[11]  Rainer Hoch,et al.  On virtual partitioning of large dictionaries for contextual post-processing to improve character recognition , 1993 .

[12]  Sargur N. Srihari,et al.  Integrating diverse knowledge sources in text recognition , 1982, TOIS.

[13]  R. Mahesh K. Sinha,et al.  On partitioning a dictionary for visual text recognition , 1990, Pattern Recognit..

[14]  Nancy Ide,et al.  Outline of a database model for electronic dictionaries , 1991, RIAO.

[15]  Mary Dee Harris Introduction to Natural Language Processing , 1984 .

[16]  James L. Peterson,et al.  Computer programs for detecting and correcting spelling errors , 1980, CACM.