论文信息 - The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection and recognition), and fine-tune them for several heterogeneous table detection datasets. Using this approach, we considerably improve the accuracy of the detection models fine-tuned from natural images (in mean a 17%, and, in the best case, up to a 60%).

[1] Daniel P. Lopresti,et al. Medium-independent table detection , 1999, Electronic Imaging.

[2] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] Massimo Ruffolo,et al. PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[4] Richard Zanibbi,et al. A survey of table recognition , 2004, Document Analysis and Recognition.

[5] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[6] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[7] Yogendra Kumar Jain,et al. Accurate Object Detection and Semantic Segmentation using Gaussian Mixture Model and CNN , 2015 .

[8] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[9] Andreas Dengel,et al. DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[10] David W. Embley,et al. Table-processing paradigms: a research survey , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[11] Zhi Tang,et al. ICDAR2017 Competition on Page Object Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[12] Zhi Tang,et al. A Table Detection Method for PDF Documents Based on Convolutional Neural Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[13] Roy George,et al. Detecting Knowledge Artifacts in Scientific Document Images - Comparing Deep Learning Architectures , 2018, 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[14] Tamir Hassan,et al. ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15] Yibo Li,et al. A YOLO-Based Table Detection Method , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17] Thomas Kieninger,et al. An open approach towards the benchmarking of table structure recognition systems , 2010, DAS '10.

[18] Andreas Dengel,et al. DeCNT: Deep Deformable CNN for Table Detection , 2018, IEEE Access.

[19] Matheus Palhares Viana,et al. Fast CNN-Based Document Layout Analysis , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21] David Doermann,et al. Handbook of Document Image Processing and Recognition , 2014, Springer London.

[22] Ana Costa e Silva,et al. 2009 10th International Conference on Document Analysis and Recognition Learning Rich Hidden Markov Models in Document Analysis: Table Location , 2022 .

[23] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Clément Chatelain,et al. Learning to Detect Tables in Scanned Document Images Using Line Information , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[25] Francesca Cesarini,et al. Trainable table location in document images , 2002, Object recognition supported by user interaction for service robots.

[26] Y. Hirayama,et al. A method for table structure analysis using DP matching , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[27] et al.,et al. Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[28] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29] Zhoujun Li,et al. TableBank: Table Benchmark for Image-based Table Detection and Recognition , 2019, LREC.

[30] Muhammad Imran Malik,et al. Table Detection Using Deep Learning , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.