A Saliency-based Convolutional Neural Network for Table and Chart Detection in Digitized Documents

Deep Convolutional Neural Networks (DCNNs) have recently been applied successfully to a variety of vision and multimedia tasks, thus driving development of novel solutions in several application domains. Document analysis is a particularly promising area for DCNNs: indeed, the number of available digital documents has reached unprecedented levels, and humans are no longer able to discover and retrieve all the information contained in these documents without the help of automation. Under this scenario, DCNNs offers a viable solution to automate the information extraction process from digital documents. Within the realm of information extraction from documents, detection of tables and charts is particularly needed as they contain a visual summary of the most valuable information contained in a document. For a complete automation of visual information extraction process from tables and charts, it is necessary to develop techniques that localize them and identify precisely their boundaries. In this paper we aim at solving the table/chart detection task through an approach that combines deep convolutional neural networks, graphical models and saliency concepts. In particular, we propose a saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents. Performance analysis carried out on an extended version of ICDAR 2013 (with annotated charts as well as tables) shows that our approach yields promising results, outperforming existing models.

[1]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[2]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[3]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[4]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[5]  Chew Lim Tan,et al.  Extraction of Vectorized Graphical Information from Scientific Chart Images , 2007 .

[6]  Yi Li,et al.  Convolutional Neural Networks for Document Image Classification , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[9]  Andreas Dengel,et al.  DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[10]  Cheng-Lin Liu,et al.  Text/Non-text Classification in Online Handwritten Documents with Conditional Random Fields , 2012, CCPR.

[11]  Faisal Shafait,et al.  Table detection in heterogeneous documents , 2010, DAS '10.

[12]  Yizhou Yu,et al.  Visual Saliency Detection Based on Multiscale Deep CNN Features , 2016, IEEE Transactions on Image Processing.

[13]  Ioannis Pratikakis,et al.  Automatic Table Detection in Document Images , 2005, ICAPR.

[14]  Ujjwal Bhattacharya,et al.  Generalized stacking of layerwise-trained Deep Convolutional Neural Networks for document image classification , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[15]  Xiao Liu,et al.  Chart classification by combining deep convolutional networks and deep belief networks , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Francesca Murabito,et al.  Top-Down Saliency Detection Driven by Visual Classification , 2017, Comput. Vis. Image Underst..

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[19]  Zhi Liu,et al.  Efficient Saliency-Model-Guided Visual Co-Saliency Detection , 2015, IEEE Signal Processing Letters.

[20]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[21]  Guosheng Lin,et al.  CRF Learning with CNN Features for Image Segmentation , 2015, Pattern Recognit..

[22]  Trilce Estrada,et al.  TAO: System for Table Detection and Extraction from PDF Documents , 2016, FLAIRS.

[23]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[24]  Ana Costa e Silva,et al.  2009 10th International Conference on Document Analysis and Recognition Learning Rich Hidden Markov Models in Document Analysis: Table Location , 2022 .

[25]  C. Lee Giles,et al.  Identifying table boundaries in digital documents via sparse line detection , 2008, CIKM '08.

[26]  P. Palanisamy,et al.  Detection of table structure and content extraction from scanned documents , 2014, 2014 International Conference on Communication and Signal Processing.

[27]  Simone Palazzo,et al.  An innovative web-based collaborative platform for video annotation , 2014, Multimedia Tools and Applications.

[28]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Sargur N. Srihari,et al.  Segmentation and labeling of documents using conditional random fields , 2007, Electronic Imaging.

[30]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chew Lim Tan,et al.  Extraction of Vectorized Graphical Information from Scientific Chart Images , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[32]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[33]  Chew Lim Tan,et al.  Model-Based Chart Image Recognition , 2003, GREC.

[34]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Marcus Herzog,et al.  Visually guided bottom-up table detection and segmentation in web documents , 2006, WWW '06.

[36]  Hugo Larochelle,et al.  Dynamic Capacity Networks , 2015, ICML.

[37]  Muhammad Imran Malik,et al.  Table Detection Using Deep Learning , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[38]  Xiao Liu,et al.  DeepChart: Combining deep convolutional networks and deep belief networks in chart classification , 2016, Signal Process..

[39]  Roshan G. Ragel,et al.  Locating tables in scanned documents for reconstructing and republishing , 2014, 7th International Conference on Information and Automation for Sustainability.

[40]  Rynson W. H. Lau,et al.  SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection , 2015, International Journal of Computer Vision.

[41]  V. Karthikeyani,et al.  Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files , 2012 .

[42]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Wolfgang Gatterbauer,et al.  Using visual cues for extraction of tabular data from arbitrary HTML documents , 2005, WWW '05.

[45]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Zhi Tang,et al.  A Table Detection Method for PDF Documents Based on Convolutional Neural Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[47]  M·拉斯科维克,et al.  Borderless table detection engine , 2012 .

[48]  Tamir Hassan,et al.  ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[49]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  In Seop Na,et al.  Table Detection from Document Image using Vertical Arrangement of Text Blocks , 2015 .

[52]  F. Y. Wu The Potts model , 1982 .

[53]  Marcus Liwicki,et al.  Deepdocclassifier: Document classification with deep Convolutional Neural Network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[54]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Jian Dong,et al.  Attentive Contexts for Object Detection , 2016, IEEE Transactions on Multimedia.

[56]  Ruiheng Qiu,et al.  A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures , 2011, 2011 International Conference on Document Analysis and Recognition.

[57]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[59]  Robert P. Futrelle,et al.  Recognition and Classification of Figures in PDF Documents , 2005, GREC.