The T-Recs Table Recognition and Analysis System
Abstract:This paper presents a new approach to table structure recognition as well as to layout analysis. The discussed recognition process differs significantly from existing approaches as it realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down. The following analysis of the recognized layout elements is based on the construction of a tile structure and detects row- and/or column spanning cells as well as sparse tables with a high degree of confidence. The overall system is completely domain independent, optionally neglects textual contents and can thus be applied to arbitrary mixed-mode documents (with or without tables) of any language and even operates on low quality OCR documents (e.g. facsimiles).
暂无分享,去 创建一个
[1] Stephen V. Rice,et al. The Fourth Annual Test of OCR Accuracy , 1995 .
[2] Lawrence O'Gorman,et al. The Document Spectrum for Bottom-Up Page Layout Analysis , 1993 .
[3] George Nagy,et al. HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .
[4] Koichi Kise,et al. Document image segmentation as selection of Voronoi edges , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).
[5] D. H. Chang,et al. Extracting Tabular Information From Text Files , 1996 .
[6] Daniela Rus,et al. Using White Space for Automated Document Structuring , 1994 .
[7] Kazuo Murota,et al. A Fast Voronoi-Diagram Algorithm With Quaternary Tree Bucketing , 1984, Inf. Process. Lett..
[8] Y. Hirayama,et al. A method for table structure analysis using DP matching , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.
[9] Rangachar Kasturi,et al. Structural recognition of tabulated data , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).
[10] Kazem Taghva,et al. Autotag: A Tool for Creating Structured Document Collections from Printed Materials , 1998, EP.
[11] Katsuhiko Itonori,et al. Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).
[12] Zhigang Fan,et al. Tabular document recognition , 1994, Electronic Imaging.
[13] Friedrich M. Wahl,et al. Document Analysis System , 1982, IBM J. Res. Dev..