A YOLO-Based Table Detection Method

Due to various table layouts and styles, table detection is always a difficult task in the field of document analysis. Inspired by the great progress of deep learning based methods on object detection, in this paper, we present a YOLO-based method for this task. Considering the large difference between document objects and natural objects, we introduce some adaptive adjustments to YOLOv3, including an anchor optimization strategy and two post processing methods. For anchor optimization, we use k-means clustering to find anchors which are more suitable for tables rather than natural objects and make it easier for our model to find exact positions of tables. In post-processing process, the extra whitespaces and noisy page objects (e.g. page headers, page footers) are removed from the predicted results, so that our model can get more accurate table margins and higher IoU scores. The proposed method is evaluated on two datasets from ICDAR 2013 Table Competition and ICDAR 2017 Page Object Detection (POD) Competition and achieves state-of-the-art performance.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andreas Dengel,et al.  Table Recognition in Heterogeneous Documents Using Machine Learning , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[4]  Thomas M. Breuel,et al.  Two Geometric Algorithms for Layout Analysis , 2002, Document Analysis Systems.

[5]  Daniel Kifer,et al.  Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[6]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alexandre V. Evfimievski,et al.  A Rectangle Mining Method for Understanding the Semantics of Financial Tables , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[9]  Tamir Hassan,et al.  Table Recognition and Understanding from PDF Files , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ruiheng Qiu,et al.  A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Robert M. Haralick,et al.  Recursive X-Y cut using bounding boxes of connected components , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[13]  Andreas Dengel,et al.  DeCNT: Deep Deformable CNN for Table Detection , 2018, IEEE Access.

[14]  Ersin Yumer,et al.  Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Muhammad Imran Malik,et al.  Table Detection Using Deep Learning , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[16]  Andreas Dengel,et al.  DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  In Seop Na,et al.  Table Detection from Document Image using Vertical Arrangement of Text Blocks , 2015 .

[18]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Maria Gabrani,et al.  Interpreting Data from Scanned Tables , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[21]  Concetto Spampinato,et al.  A Saliency-based Convolutional Neural Network for Table and Chart Detection in Digitized Documents , 2018, ICIAP.

[22]  Thomas G Kieninger,et al.  Table structure recognition based on robust block segmentation , 1998, Electronic Imaging.

[23]  Wolfgang Lehner,et al.  Table Recognition in Spreadsheets via a Graph Representation , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[24]  Jean-Luc Meunier,et al.  Comparing Machine Learning Approaches for Table Recognition in Historical Register Books , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[25]  Fei Yin,et al.  Page Object Detection from PDF Document Images by Deep Structured Prediction and Supervised Clustering , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[26]  Zhi Tang,et al.  ICDAR2017 Competition on Page Object Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Zhi Tang,et al.  A Table Detection Method for PDF Documents Based on Convolutional Neural Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[28]  Katharina Kaiser,et al.  pdf2table: A Method to Extract Table Information from PDF Files , 2005, IICAI.

[29]  Tamir Hassan,et al.  ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[30]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.