Major components of a complete text reading system

The document image processes used in a recently developed text reading system are described. The system consists of three major components: document analysis, document understanding, and character segmentation/recognition. The document analysis component extracts lines of text from a page for recognition. The document understanding component extracts logical relationships between the document constituents. The character segmentation/recognition component extracts characters from a text line and recognizes them. Experiments on more than a hundred documents have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and that the proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other. >

[1]  Masaaki Mizuno,et al.  Document Recognition System with Layout Structure Generator , 1990, MVA.

[2]  Donato Malerba,et al.  An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[3]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[4]  Haruo Asada,et al.  Resolving Ambiguity in Segmenting Touching Characters , 1992 .

[5]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[6]  Masayuki Okamoto,et al.  An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions , 1992 .

[7]  S. Tsujimoto,et al.  Understanding multi-articled documents , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[8]  Toshikazu Kato,et al.  MACSYM: A hierarchical parallel image processing system for event-driven pattern understanding of documents , 1984, Pattern Recognit..

[9]  Taizo Iijima,et al.  A Theory of Character Recognition by Pattern Matching Method , 1974 .

[10]  Sargur N. Srihari,et al.  Document Image Analysis and Recognition , 1992 .

[11]  Henry S. Baird,et al.  Anatomy of a Page Reader , 1990, MVA.

[12]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  K. S. Baird,et al.  Anatomy of a versatile page reader , 1992, Proc. IEEE.

[14]  Ken Thompson,et al.  Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..