Region segmentation for table image with unknown complex structure

In this paper, we describe a system of region segmentation and conversion into an HTML file for an unknown machine-printed table image. Ruled lines delimit some cells of the table, and omitted ruled lines also delimit other cells. We consider a table analysis system for both types of table cell. First, our system segments a table by means of the ruled lines into some regions. Secondly, these segmented regions are further segmented into cells by the omitted ruled lines that are indicators (such as numerals and characters). The cells include several character lines, and our system can convert a table of unknown complex structure into an HTML file. Also, we confirm the effectiveness of our region segmentation method for various kinds of tables with omitted ruled lines by computer experiments.

[1]  H.S. Baird,et al.  A retargetable table reader , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2]  吉川 大弘,et al.  Table Form Document Understanding Using Node Classification Method , 1998 .

[3]  Konstantin Zuyev Table image segmentation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  WOLFGANG TERSTEEGEN SCANTAB: TABLE RECOGNITION BY REFERENCE TABLES , 1998 .

[5]  Yolande Belaïd,et al.  Form Analysis by Neural Classification of Cells , 1998, Document Analysis Systems.

[6]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..