CNN Based Page Object Detection in Document Images

This electronic document is a "live" template. The various components of your paper [title, text, heads, etc.] are Abstract—Object detection in natural scenes has been widely researched in the past decade, and many deep learning based methods have achieved good performance on this task. This paper focuses on how to transfer and refine those object detection approaches from natural scene images to documents images, and proposes a deep learning-based page object (e.g., tables, formulae, figures) detection method. On the basis of traditional Convolutional Neural Network (CNN) based object detection methods, we redesign the region proposal method, the training strategy, the network structure and replace the Non-Maximum Suppression (NMS) with a dynamic programming algorithm. The experimental results show that it is essential to adjust some modules of the natural scene object detection approaches in order to better process the document images. The proposed method also achieved better performance compared with existing page object detection methods.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Volker Sorge,et al.  Mathematical formula identification and performance evaluation in PDF documents , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Francesca Cesarini,et al.  Encoding of modified X-Y trees for document classification , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Henry S. Baird,et al.  Distinguishing mathematics notation from English text using computational geometry , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[5]  T. Yorozu,et al.  Electron Spectroscopy Studies on Magneto-Optical Media and Plastic Substrate Interface , 1987, IEEE Translation Journal on Magnetics in Japan.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Kai Chen,et al.  Hybrid Page Segmentation with Efficient Whitespace Rectangles Extraction and Grouping , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[8]  Thierry Géraud,et al.  The SCRIBO Module of the Olena Platform: A Free Software Framework for Document Image Analysis , 2011, 2011 International Conference on Document Analysis and Recognition.

[9]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[10]  Ying Liu,et al.  An Efficient Pre-processing Method to Identify Logical Components from PDF Documents , 2011, PAKDD.

[11]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[12]  Sonia Garcia-Salicetti,et al.  A hierarchical and recursive model of mathematical expressions for automatic reading of mathematical documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Robert M. Haralick,et al.  Recursive X-Y cut using bounding boxes of connected components , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[16]  Tamir Hassan,et al.  Table Recognition and Understanding from PDF Files , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[17]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Massimo Ruffolo,et al.  PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ruiheng Qiu,et al.  A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  Zhi Tang,et al.  A Table Detection Method for PDF Documents Based on Convolutional Neural Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).