The T-Recs Table Recognition and Analysis System

This paper presents a new approach to table structure recognition as well as to layout analysis. The discussed recognition process differs significantly from existing approaches as it realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down. The following analysis of the recognized layout elements is based on the construction of a tile structure and detects row- and/or column spanning cells as well as sparse tables with a high degree of confidence. The overall system is completely domain independent, optionally neglects textual contents and can thus be applied to arbitrary mixed-mode documents (with or without tables) of any language and even operates on low quality OCR documents (e.g. facsimiles).

[1]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[2]  Lawrence O'Gorman,et al.  The Document Spectrum for Bottom-Up Page Layout Analysis , 1993 .

[3]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[4]  Koichi Kise,et al.  Document image segmentation as selection of Voronoi edges , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[5]  D. H. Chang,et al.  Extracting Tabular Information From Text Files , 1996 .

[6]  Daniela Rus,et al.  Using White Space for Automated Document Structuring , 1994 .

[7]  Kazuo Murota,et al.  A Fast Voronoi-Diagram Algorithm With Quaternary Tree Bucketing , 1984, Inf. Process. Lett..

[8]  Y. Hirayama,et al.  A method for table structure analysis using DP matching , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[9]  Rangachar Kasturi,et al.  Structural recognition of tabulated data , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[10]  Kazem Taghva,et al.  Autotag: A Tool for Creating Structured Document Collections from Printed Materials , 1998, EP.

[11]  Katsuhiko Itonori,et al.  Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Zhigang Fan,et al.  Tabular document recognition , 1994, Electronic Imaging.

[13]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..