Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images

In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.

[1]  D. S. Roy,et al.  Robust Detection of Tables in Documents Using Scores from Table Cell Cores , 2022, SN Computer Science.

[2]  Nguyen D. Vo,et al.  Page Object Detection with YOLOF , 2021, 2021 8th NAFOSTED Conference on Information and Computer Science (NICS).

[3]  Didier Stricker,et al.  CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution , 2021, J. Imaging.

[4]  Didier Stricker,et al.  HybridTabNet: Towards Better Table Detection in Scanned Document Images , 2021, Applied Sciences.

[5]  Zhenrong Zhang,et al.  Split, embed and merge: An accurate table structure recognizer , 2021, Pattern Recognit..

[6]  Haibin Ling,et al.  CBNet: A Composite Backbone Network Architecture for Object Detection , 2021, IEEE Transactions on Image Processing.

[7]  Didier Stricker,et al.  Current Status and Performance Analysis of Table Recognition in Document Images With Deep Neural Networks , 2021, IEEE Access.

[8]  Didier Stricker,et al.  A Survey of Graphical Page Object Detection with Deep Neural Networks , 2021, Applied Sciences.

[9]  Yi Jiang,et al.  Sparse R-CNN: End-to-End Object Detection with Learnable Proposals , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  C. V. Jawahar,et al.  Table Structure Recognition using Top-Down and Bottom-Up Cues , 2020, ECCV.

[11]  C. V. Jawahar,et al.  CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[12]  C. V. Jawahar,et al.  IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents , 2020, DAS.

[13]  Furu Wei,et al.  DocBank: A Benchmark Dataset for Document Layout Analysis , 2020, COLING.

[14]  D. Prasad,et al.  CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Zhi Tang,et al.  CBNet: A Novel Composite Backbone Network Architecture for Object Detection , 2019, AAAI.

[16]  C. V. Jawahar,et al.  Graphical Object Detection in Document Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[17]  Yu Fang,et al.  ICDAR 2019 Competition on Table Detection and Recognition (cTDaR) , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[18]  Xiaoming Hu,et al.  Faster R-CNN Based Table Detection Combining Corner Locating , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[19]  Antonio Jimeno-Yepes,et al.  PubLayNet: Largest Dataset Ever for Document Layout Analysis , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[20]  Heyan Huang,et al.  Complicated Table Structure Recognition , 2019, ArXiv.

[21]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[22]  Zhoujun Li,et al.  TableBank: Table Benchmark for Image-based Table Detection and Recognition , 2019, LREC.

[23]  Andreas Dengel,et al.  DeCNT: Deep Deformable CNN for Table Detection , 2018, IEEE Access.

[24]  Viacheslav Paramonov,et al.  TabbyPDF: Web-Based System for PDF Table Extraction , 2018, ICIST.

[25]  Fei Yin,et al.  Page Object Detection from PDF Document Images by Deep Structured Prediction and Supervised Clustering , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[26]  Tam V. Nguyen,et al.  Ensemble of Deep Object Detectors for Page Object Detection , 2018, IMCOM.

[27]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Zhi Tang,et al.  ICDAR2017 Competition on Page Object Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[29]  Muhammad Imran Malik,et al.  Table Detection Using Deep Learning , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[30]  Andreas Dengel,et al.  DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[31]  Yuan Liao,et al.  CNN Based Page Object Detection in Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[32]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[33]  Alexey O. Shigarov,et al.  Configurable Table Structure Recognition in Untagged PDF documents , 2016, DocEng.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Tamir Hassan,et al.  ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[39]  Ying Liu,et al.  Dataset, Ground-Truth and Performance Metrics for Table Detection Evaluation , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[40]  Thomas Kieninger,et al.  An open approach towards the benchmarking of table structure recognition systems , 2010, DAS '10.

[41]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[42]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[43]  Thomas Kieninger,et al.  Applying the T-Recs table recognition system to the business letter domain , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[44]  Daniel P. Lopresti,et al.  Medium-independent table detection , 1999, Electronic Imaging.

[45]  Rangachar Kasturi,et al.  Structural recognition of tabulated data , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[46]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[47]  D. H. Chang,et al.  Extracting Tabular Information From Text Files , 1996 .

[48]  Ana Costa e Silva,et al.  2009 10th International Conference on Document Analysis and Recognition Learning Rich Hidden Markov Models in Document Analysis: Table Location , 2022 .