Analyzing form images by using line-shared-adjacent cell relations

We deal with formats whose fields do not have rigidly determined positions and sizes but have topological relations between them. Such formats are called the "topological formats". The objective of our research is to establish a method for defining a topological format and detecting fields in images by using that format. The method has the following characteristics: 1) a line-shared-adjacent (LSA) cell relation and a LSA format are proposed, and a topological format can be defined with the LSA format; 2) concepts of hierarchical class can be applied to the format, where a format unification operator is defined to create the hierarchy and can be used to generate a superclass format, and it also allows users to generate formats from scanned images; and 3) an LSA format can be converted into an equivalent line-oriented format that can be used for processing actual forms. Since the format consists of line connection information, the method is robust with respect to flaws of line segments extracted from the images. The method was applied to images of sample forms that have various flaws, and satisfying results were obtained.

[1]  Yasuto Ishitani,et al.  Flexible and Robust Model Matching based on Association Graph for Form Image Understanding , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Nobuyasu Itoh,et al.  DRS: a workstation-based document recognition system for text entry , 1992, Computer.