DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images

This paper presents a novel end-to-end system for table understanding in document images called DeepDeSRT. In particular, the contribution of DeepDeSRT is two-fold. First, it presents a deep learning-based solution for table detection in document images. Secondly, it proposes a novel deep learning-based approach for table structure recognition, i.e. identifying rows, columns, and cell positions in the detected tables. In contrast to existing rule-based methods, which rely on heuristics or additional PDF metadata (like, for example, print instructions, character bounding boxes, or line segments), the presented system is data-driven and does not need any heuristics or metadata to detect as well as to recognize tabular structures in document images. Furthermore, in contrast to most existing table detection and structure recognition methods, which are applicable only to PDFs, DeepDeSRT processes document images, which makes it equally suitable for born-digital PDFs (as they can automatically be converted into images) as well as even harder problems, e.g. scanned documents. To gauge the performance of DeepDeSRT, the system is evaluated on the publicly available ICDAR 2013 table competition dataset containing 67 documents with 238 pages overall. Evaluation results reveal that DeepDeSRT outperforms state-of-the-art methods for table detection and structure recognition and achieves F1-measures of 96.77% and 91.44% for table detection and structure recognition, respectively. Additionally, DeepDeSRT is evaluated on a closed dataset from a real use case of a major European aviation company comprising documents which are highly unlike those in ICDAR 2013. Tested on a randomly selected sample from this dataset, DeepDeSRT achieves high detection accuracy for tables which demonstrates the sound generalization capabilities of our system.

[1]  Miao Fan,et al.  Detecting Table Region in PDF Documents Using Distant Supervision , 2015 .

[2]  Yeongjae Cheon,et al.  PVANet: Lightweight Deep Neural Networks for Real-time Object Detection , 2016, ArXiv.

[3]  Shah Khusro,et al.  On methods and tools of table detection, extraction and annotation in PDF documents , 2015, J. Inf. Sci..

[4]  Edward A. Lee,et al.  Parts that add up to a whole : a framework for the analysis of tables , 2007 .

[5]  Alexey O. Shigarov,et al.  Configurable Table Structure Recognition in Untagged PDF documents , 2016, DocEng.

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luís Torgo,et al.  Design of an end-to-end method to extract information from tables , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[9]  David W. Embley,et al.  Table-processing paradigms: a research survey , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[10]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Clément Chatelain,et al.  Learning to Detect Tables in Scanned Document Images Using Line Information , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[15]  Francesca Cesarini,et al.  Trainable table location in document images , 2002, Object recognition supported by user interaction for service robots.

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17]  In Seop Na,et al.  Table Detection from Document Image using Vertical Arrangement of Text Blocks , 2015 .

[18]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[19]  Ana Costa e Silva,et al.  2009 10th International Conference on Document Analysis and Recognition Learning Rich Hidden Markov Models in Document Analysis: Table Location , 2022 .

[20]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Ying Liu,et al.  Dataset, Ground-Truth and Performance Metrics for Table Detection Evaluation , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[23]  J. Cordy,et al.  A Survey of Table Recognition : Models , Observations , Transformations , and Inferences , 2003 .

[24]  Aurélie Lemaitre,et al.  Recognition of Tables and Forms , 2014 .

[25]  Thomas Kieninger,et al.  The T-Recs Table Recognition and Analysis System , 1998, Document Analysis Systems.

[26]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Zhi Tang,et al.  A Table Detection Method for PDF Documents Based on Convolutional Neural Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[29]  Yalin Wang,et al.  Table structure understanding and its performance evaluation , 2004, Pattern Recognit..

[30]  Katharina Kaiser,et al.  pdf2table: A Method to Extract Table Information from PDF Files , 2005, IICAI.

[31]  Tamir Hassan,et al.  ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[32]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[33]  David Doermann,et al.  Handbook of Document Image Processing and Recognition , 2014, Springer London.