相关论文

The T-Recs Table Recognition and Analysis System

Abstract:This paper presents a new approach to table structure recognition as well as to layout analysis. The discussed recognition process differs significantly from existing approaches as it realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down. The following analysis of the recognized layout elements is based on the construction of a tile structure and detects row- and/or column spanning cells as well as sparse tables with a high degree of confidence. The overall system is completely domain independent, optionally neglects textual contents and can thus be applied to arbitrary mixed-mode documents (with or without tables) of any language and even operates on low quality OCR documents (e.g. facsimiles).

参考文献

[1]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[2]  Lawrence O'Gorman,et al.  The Document Spectrum for Bottom-Up Page Layout Analysis , 1993 .

[3]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[4]  Koichi Kise,et al.  Document image segmentation as selection of Voronoi edges , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[5]  D. H. Chang,et al.  Extracting Tabular Information From Text Files , 1996 .

[6]  Daniela Rus,et al.  Using White Space for Automated Document Structuring , 1994 .

[7]  Kazuo Murota,et al.  A Fast Voronoi-Diagram Algorithm With Quaternary Tree Bucketing , 1984, Inf. Process. Lett..

[8]  Y. Hirayama,et al.  A method for table structure analysis using DP matching , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[9]  Rangachar Kasturi,et al.  Structural recognition of tabulated data , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[10]  Kazem Taghva,et al.  Autotag: A Tool for Creating Structured Document Collections from Printed Materials , 1998, EP.

[11]  Katsuhiko Itonori,et al.  Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Zhigang Fan,et al.  Tabular document recognition , 1994, Electronic Imaging.

[13]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

引用
Consensus-based table form recognition
Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.
2003
Disentangling the Structure of Tables in Scientific Literature
NLDB
2016
TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images
2019 International Conference on Document Analysis and Recognition (ICDAR)
2019
A table-form extraction with artefact removal
SAC '07
2007
TableBank: Table Benchmark for Image-based Table Detection and Recognition
LREC
2019
DocParser: Hierarchical Structure Parsing of Document Renderings
ArXiv
2019
Applying the T-Recs table recognition system to the business letter domain
Proceedings of Sixth International Conference on Document Analysis and Recognition
2001
Pre-Printed and Hand-Filled Table-Form Analysis Aiming Cell Extraction
2008 The Eighth IAPR International Workshop on Document Analysis Systems
2008
DeepTabStR: Deep Learning based Table Structure Recognition
2019 International Conference on Document Analysis and Recognition (ICDAR)
2019
DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
2017
Three approaches to "industrial" table spotting
Proceedings of Sixth International Conference on Document Analysis and Recognition
2001
Image-based logical document structure recognition
Pattern Analysis and Applications
2014
The HiLeX System for Semantic Information Extraction
Trans. Large Scale Data Knowl. Centered Syst.
2012
Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks
2019 International Conference on Document Analysis and Recognition (ICDAR)
2019
Table Localization and Segmentation using GAN and CNN
2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)
2019
CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images
2020 25th International Conference on Pattern Recognition (ICPR)
2020
Digital mountain: from granite archive to global access
First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings.
2004
Using Layout Data for the Analysis of Scientific Literature
Mining Complex Data
2008
FFD: Figure and Formula Detection from Document Images
2019 Digital Image Computing: Techniques and Applications (DICTA)
2019
g-DICE: graph mining-based document information content exploitation
International Journal on Document Analysis and Recognition (IJDAR)
2015