Table Detection Using Deep Learning

Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to the reader in a structured manner. It is a hard problem due to varying layouts and encodings of the tables. Researchers have proposed numerous techniques for table detection based on layout analysis of documents. Most of these techniques fail to generalize because they rely on hand engineered features which are not robust to layout variations. In this paper, we have presented a deep learning based method for table detection. In the proposed method, document images are first pre-processed. These images are then fed to a Region Proposal Network followed by a fully connected neural network for table detection. The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines. We have done our evaluations on publicly available UNLV dataset where it beats Tesseract's state of the art table detection system by a significant margin.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Clément Chatelain,et al.  Learning to Detect Tables in Scanned Document Images Using Line Information , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Gaurav Harit,et al.  Table detection in document images using header and trailer patterns , 2012, ICVGIP '12.

[4]  Tran Tuan Anh,et al.  A hybrid method for table detection from document image , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[5]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[6]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.

[7]  Ingemar Ragnemalm,et al.  The Euclidean distance transform in arbitrary dimensions , 1992, Pattern Recognit. Lett..

[8]  Ana Costa e Silva,et al.  2009 10th International Conference on Document Analysis and Recognition Learning Rich Hidden Markov Models in Document Analysis: Table Location , 2022 .

[9]  Roshan G. Ragel,et al.  Locating tables in scanned documents for reconstructing and republishing , 2014, 7th International Conference on Information and Automation for Sustainability.

[10]  David G. Kirkpatrick,et al.  Linear Time Euclidean Distance Algorithms , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Zhi Tang,et al.  A Table Detection Method for PDF Documents Based on Convolutional Neural Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[12]  Yalin Wang,et al.  Automatic table ground truth generation and a background-analysis-based table structure extraction method , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[13]  Daniel P. Lopresti,et al.  Medium-independent table detection , 1999, Electronic Imaging.

[14]  Ioannis Pratikakis,et al.  Automatic Table Detection in Document Images , 2005, ICAPR.

[15]  D. H. Chang,et al.  Extracting Tabular Information From Text Files , 1996 .

[16]  Thomas Kieninger,et al.  Table Recognition and Labeling Using Intrinsic Layout Features , 1999 .

[17]  Luciano da Fontoura Costa,et al.  2D Euclidean distance transform algorithms: A comparative survey , 2008, CSUR.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ruiheng Qiu,et al.  A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Thomas Kieninger,et al.  An approach towards benchmarking of table structure recognition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[21]  Thomas Kieninger,et al.  Applying the T-Recs table recognition system to the business letter domain , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[22]  Sekhar Mandal,et al.  A simple and effective table detection system from document images , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[23]  Thomas M. Breuel,et al.  Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Faisal Shafait,et al.  Table detection in heterogeneous documents , 2010, DAS '10.

[25]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .