Segmentation methods for character recognition: from segmentation to document structure analysis

A pattern-oriented segmentation method for optical character recognition that leads to document structure analysis is presented. As a first example, segmentation of handwritten numerals that touch are treated. Connected pattern components are extracted, and spatial interrelations between components are measured and grouped into meaningful character patterns. Stroke shapes are analyzed and a method of finding the touching positions that separates about 95% of connected numerals correctly is described. Ambiguities are handled by multiple hypotheses and verification by recognition. An extended form of pattern-oriented segmentation, tabular form recognition, is considered. Images of tabular forms are analyzed, and frames in the tabular structure are extracted. By identifying semantic relationships between label frames and data frames, information on the form can be properly recognized. >

[1]  Donato Malerba,et al.  An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[2]  Yasuaki Nakano,et al.  A Top-Down Approach to the Analysis of Document Images , 1992 .

[3]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Hiromichi Fujisawa Artificial Intelligence as applied to Optical Image Filing : SYSTEMS , 1987 .

[5]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[6]  Kazuhiko Yamamoto,et al.  Recognition of handprinted characters by an outermost point method , 1980, Pattern Recognit..

[7]  Yoshihiro Shima,et al.  A new method of document structure extraction using generic layout knowledge , 1989, International Workshop on Industrial Applications of Machine Intelligence and Vision,.

[8]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Toshikazu Kato,et al.  MACSYM: A hierarchical parallel image processing system for event-driven pattern understanding of documents , 1984, Pattern Recognit..

[10]  George Nagy,et al.  DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM , 1986 .

[11]  H. Masuzaki,et al.  HITFILE 650E optical disk filing system , 1987 .

[12]  Toyohide Watanabe,et al.  Recognition of Document Structure on the Basis of Spatial and Geometric Relationships between Document Items , 1990, MVA.

[13]  D. T. Wang,et al.  Structured Document Image Analysis, IAPR Workshop on Syntactic and Structural Pattern Recognition, 13-15 June 1990, Murray Hill, NJ, USA , 1990 .

[14]  Kazuhiko Yamamoto,et al.  Research on Machine Recognition of Handprinted Characters , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Masashi Koga,et al.  A High Speed Word Matching Algorithm for Handwritten Chinese Character Recognition , 1990, MVA.

[16]  Masayuki Okamoto,et al.  An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions , 1992 .

[17]  Andreas Dengel,et al.  ANASTASIL: A Hybrid Knowledge-Based System for Document Layout Analysis , 1989, IJCAI.

[18]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[19]  Sargur N. Srihari,et al.  Object recognition in visually complex environments: an architecture for locating address blocks on mail pieces , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[20]  Yasuaki Nakano,et al.  An algorithm for the skew normalization of document image , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[21]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Hiroshi Murase Online recognition of free-format Japanese handwritings , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[23]  Yasuaki Nakano,et al.  A Segmentation Method of Color Document Images for Multimedia Content Retrieval Systems , 1988, RIAO.