Degraded text recognition using visual and linguistic context

To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, word-level postprocessing methods, such as dictionary-lookup, have been applied successfully. However, many OCR errors cannot be corrected by word-level postprocessing. To overcome this limitation, passage-level postprocessing, in which global contextual information is utilized, is necessary. This thesis addresses problems in degraded text recognition and discusses potential solutions through passage-level postprocessing. The objective is to develop a postprocessing methodology from a broader perspective. In this work, two classes of inter-word contextual constraints, visual constraints and linguistic constraints, are exploited extensively. Given a text page with hundreds of words, many word image instances can be found visually similar. Formally, six types of visual inter-word relations are defined. Relations at the image level must be consistent with the relations at the symbolic level if word images in the text have been interpreted correctly. Based on the fact that OCR results often violate this consistency, methods of visual consistency analysis are designed to detect and correct OCR errors. Linguistic knowledge sources such as lexicography, syntax, and semantics, can be used to detect and correct OCR errors. Here, we focus on the word candidate selection problem. In this approach an OCR provides several alternatives for each word and the objective of postprocessing is to choose the correct decision among these choices. Two approaches of linguistic analysis, statistical and structural, are proposed for the problem of candidate selection. A word-collocation-based relaxation algorithm and a probabilistic lattice parsing algorithm are proposed. There exist some OCR errors which are not easily recoverable by either visual consistency analysis or linguistic consistency analysis. Integration of image analysis and language-level analysis provides a natural way to handle difficult words.

[1]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[2]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[3]  Sargur N. Srihari,et al.  Visual similarity analysis of Chinese characters and its uses in Japanese OCR , 1995, Electronic Imaging.

[4]  Tin Kam Ho,et al.  Perfect metrics , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Tao Hong,et al.  Degraded Text Recognition Using Word Collocation and Visual Inter-Word Constraints , 1994, ANLP.

[6]  Sargur N. Srihari,et al.  Computer Text Recognition and Error Correction , 1985 .

[7]  Haruo Asada,et al.  Resolving Ambiguity in Segmenting Touching Characters , 1992 .

[8]  Mitchell P. Marcus,et al.  A theory of syntactic recognition for natural language , 1979 .

[9]  J. J. Hull,et al.  Keyword selection from word recognition results using definitional overlap , 1994 .

[10]  Eric Sven Ristad,et al.  Computational structure of GPSG models , 1990 .

[11]  Joseph Picone,et al.  Chart Parsing of Stochastic Spoken Language Models , 1989, HLT.

[12]  Tao Hong,et al.  Text recognition enhancement with a probabilistic lattice chart parser , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[13]  Dan S. Bloomberg,et al.  Word spotting in scanned images using hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Julie Borsack,et al.  Expert system for automatically correcting OCR output , 1994, Electronic Imaging.

[15]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  George M. White,et al.  Natural language understanding and speech recognition , 1990, CACM.

[17]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[18]  Daniel P. Lopresti,et al.  Using Consensus Sequence Voting to Correct OCR Errors , 1997, Comput. Vis. Image Underst..

[19]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[20]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[21]  George Nagy,et al.  Decoding Substitution Ciphers by Means of Word Matching with Application to OCR , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Chi Fang,et al.  Modified character-level deciphering algorithm for OCR in degraded documents , 1995, Electronic Imaging.

[23]  Keh-Yih Su,et al.  GPSM: A Generalized Probabilistic Semantic Model for Ambiguity Resolution , 1992, ACL.

[24]  Daniel P. Lopresti,et al.  Issues in automatic OCR error classification , 1994 .

[25]  Tao Hong Integration of Visual Inter-Word Constraints and Linguistic Knowledge in Degraded Text Recognition , 1994, ACL.

[26]  Ken Thompson,et al.  Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Jonathan J. Hull,et al.  Font and Function Word Identification in Document Recognition , 1996, Comput. Vis. Image Underst..

[28]  Dave Elliman,et al.  A review of segmentation and contextual analysis techniques for text recognition , 1990, Pattern Recognit..

[29]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[30]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[31]  Henry S. Baird,et al.  Document image defect models , 1995 .

[32]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[33]  George Nagy,et al.  Self-correcting 100-font classifier , 1994, Electronic Imaging.

[34]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[35]  Sargur N. Srihari,et al.  A word shape analysis approach to lexicon based word recognition , 1992, Pattern Recognit. Lett..

[36]  Henry S. Baird,et al.  Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[37]  Sargur N. Srihari,et al.  High-performance reading machines , 1992 .

[38]  Peter Sells,et al.  Lectures on contemporary syntactic theories , 1985 .

[39]  Sargur N. Srihari,et al.  The design of a nearest-neighbor classifier and its use for Japanese character recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[40]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[41]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[42]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Lindsay J. Evett,et al.  Text Recognition and Collocations and Domain Codes , 1993, VLC@ACL.

[44]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[45]  Tao Hong,et al.  Degraded text recognition using word collocation , 1994, Electronic Imaging.

[46]  K. Rayner,et al.  The psychology of reading , 1989 .

[47]  Kazuhiko Yamamoto,et al.  Research on Machine Recognition of Handprinted Characters , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Philip A. Chou,et al.  Document Image Decoding Using , 1994 .

[49]  J. Jenkins,et al.  Word association norms , 1964 .

[50]  Tomek Strzalkowski,et al.  Information Retrieval Using Robust Natural Language Processing , 1992, ACL.

[51]  Eberhard Mandler,et al.  Document analysis-from pixels to contents , 1992 .

[52]  Lindsay J. Evett,et al.  Fast dictionary look-up for contextual word recognition , 1990, Pattern Recognit..

[53]  R. Borsley Syntactic Theory: A Unified Approach , 1991 .

[54]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[55]  Stephen V. Rice,et al.  The Third Annual Test of OCR Accuracy , 1994 .

[56]  Jonathan J. Hull Document Image Matching and Retrieval With Multiple Distortion-Invariant Descriptors , 1995 .

[57]  Carolyn Penstein Rosé,et al.  Recovering From Parser Failures: A Hybrid Statistical/Symbolic Approach , 1994, ArXiv.

[58]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[59]  Jonathan J. Hull Incorporation of a Markov model of language syntax in a text recognition algorithm , 1995 .

[60]  Tao Hong,et al.  Visual inter-word relations and their use in OCR postprocessing , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[61]  Jonathan J. Hull,et al.  A computational theory of visual word recognition , 1988 .

[62]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[63]  Mindy Bokser,et al.  Omnidocument technologies , 1992, Proc. IEEE.

[64]  Mihai Nadin T. Winograd, Language as a Cognitive Process, Volume I: Syntax , 1985, Artif. Intell..

[65]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[66]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[67]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[68]  Mitchell P. Marcus,et al.  Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[69]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[70]  G. Miller,et al.  Semantic networks of english , 1991, Cognition.

[71]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[72]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[73]  Martin Kay,et al.  Algorithm schemata and data structures in syntactic processing , 1986 .

[74]  George Nagy,et al.  At the frontiers of OCR , 1992, Proc. IEEE.

[75]  A. Ardeshir Goshtasby,et al.  Contextual word recognition using probabilistic relaxation labeling , 1988, Pattern Recognit..

[76]  Kuo-Chin Fan,et al.  Optical recognition of handwritten Chinese characters by partial matching , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[77]  Ken Lunde,et al.  Understanding Japanese Information Processing , 1993 .

[78]  Jonathan J. Hull,et al.  Font identification using visual global context , 1994, Electronic Imaging.

[79]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[80]  Jonathan J. Hull,et al.  Improving ocr performance with word image equivalence , 1995 .

[81]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[82]  Yorick Wilks,et al.  Combining weak methods in large scale text processing , 1992 .

[83]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[84]  Henry S. Baird,et al.  Asymptotic accuracy of two-class discrimination , 1994 .

[85]  Kazem Taghva,et al.  The Effects of Noisy Data on Text Retrieval , 1994, J. Am. Soc. Inf. Sci..

[86]  T. Wasow,et al.  Grammatical theory , 1989 .

[87]  Tin Kam Ho,et al.  Evaluation of OCR Accuracy Using Synthetic Data , 1995 .

[88]  Alan K. Mackworth Constraint Satisfaction , 1985 .

[89]  Douglas E. Appelt,et al.  Robust Processing of Real-World Natural-Language Texts , 1992, ANLP.

[90]  K. S. Baird,et al.  Anatomy of a versatile page reader , 1992, Proc. IEEE.

[91]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[92]  Gary E. Kopec,et al.  Editing images of text , 1994, CACM.

[93]  Sargur N. Srihari,et al.  Word Recognition With Multi-Level Contextual Knowledge , 1991 .

[94]  Andrew Radford,et al.  Transformational Grammar: A First Course , 1988 .

[95]  Jonathan J. Hull,et al.  Keyword Location in Noisy Document Images , 1993 .

[96]  Lawrence O'Gorman,et al.  The RightPages image-based electronic library for alerting and browsing , 1992, Computer.

[97]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[98]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[99]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1994, Pattern Recognit..

[100]  Kathleen McKeown,et al.  Emergent Linguistic Rules from inducing Decision Trees: Disambiguating Discourse Clue Words , 1994, AAAI.

[101]  Sargur N. Srihari,et al.  Experiments in Text Recognition with Binary n-Gram and Viterbi Algorithms , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[103]  Jonathan J. Hull,et al.  Visual Global . Context : Word Image Matching in a Methodology for Degraded Text Recognition , 1992 .

[104]  Masaru Tomita Why Parsing Technologies , 1991 .

[105]  Lin-Shan Lee,et al.  An augmented chart data structure with efficient word lattice parsing scheme in speech recognition applications , 1990, COLING 1990.

[106]  Hiyan Alshawi,et al.  Qualitative and Quantitative Models of Speech Translation , 1994, ArXiv.

[107]  Wayne H. Ward,et al.  High level knowledge sources in usable speech recognition systems , 1990 .

[108]  Stephen V. Rice,et al.  An Evaluation of OCR Accuracy , 1993 .

[109]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[110]  Suresh Subramaniam,et al.  Performance evaluation of two OCR systems , 1994 .

[111]  Jerry R. Hobbs,et al.  Two Principles of Parse Preference , 1990, COLING.

[112]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[113]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[114]  Tin Kam Ho,et al.  World image matching as a technique for degraded text recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.