A method for table structure analysis using DP matching

This paper presents a novel method for table structure analysis. Many documents have table areas, and some have both table and figure areas. It is very important to be able to classify table and figure areas automatically. Furthermore, in tables, the column and row in which a character string is located are very important pieces of information. To detect and analyze table areas, the following method is applied: First, areas that may contain tables or figures are distinguished from text areas by the presence of horizontal and vertical lines. Next, the areas are assumed to be table areas and are analyzed as such. A judgment is made on whether each of the areas can in fact be a table area or not; in this way, the actual table areas are detected. Finally, the structures of the areas are analyzed and character strings in the areas are arranged by using the DP matching method. This method was applied to sixty-five pages of Japanese technical papers, magazines, manuals for software programs, and pages including 34 table areas, 48 line drawing areas, and 35 image areas. As a result, 96.6 percent of the areas were detected correctly and 91.7 percent of the tables were analyzed and arranged correctly.

[1]  Toyohide Watanabe,et al.  Toward a practical document understanding of table-form documents: its framework and knowledge representation , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  Nobuyasu Itoh,et al.  DRS: a workstation-based document recognition system for text entry , 1992, Computer.

[3]  Rangachar Kasturi,et al.  Structural recognition of tabulated data , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Yuki Hirayama,et al.  A block segmentation method for document images with complicated column structures , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Katsuhiko Itonori,et al.  Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6]  Takashi Saitoh,et al.  Document image segmentation and text area ordering , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).