论文信息 - Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer

Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike previous methods, we formulate table separation line prediction as a line regression problem instead of an image segmentation problem and propose a new two-stage dynamic queries enhanced DETR based separation line regression approach, named DQ-DETR, to predict separation lines from table images directly. Compared to Vallina DETR, we propose three improvements in DQ-DETR to make the two-stage DETR framework work efficiently and effectively for the separation line prediction task: 1) A new query design, named Dynamic Query, to decouple single line query into separable point queries which could intuitively improve the localization accuracy for regression tasks; 2) A dynamic queries based progressive line regression approach to progressively regressing points on the line which further enhances localization accuracy for distorted tables; 3) A prior-enhanced matching strategy to solve the slow convergence issue of DETR. After separation line prediction, a simple relation network based cell merging module is used to recover spanning cells. With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet, WTW and FinTabNet. Furthermore, we have validated the robustness and high localization accuracy of our approach to tables with complex structures, borderless cells, large blank spaces, empty or spanning cells as well as distorted or even curved shapes on a more challenging real-world in-house dataset.

[1] Haojie Li,et al. TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers , 2022, ArXiv.

[2] Qiang Huo,et al. TSRFormer: Table Structure Recognition with Transformers , 2022, ACM Multimedia.

[3] Fei Yin,et al. Table Structure Recognition and Form Parsing by End-to-End Object Detection and Relation Parsing , 2022, Pattern Recognit..

[4] Xiangyu Zhang,et al. Anchor DETR: Query Design for Transformer-Based Detector , 2022, AAAI.

[5] Qiang Huo,et al. Robust Table Detection and Structure Recognition from Heterogeneous Document Images , 2022, Pattern Recognit..

[6] H. Shum,et al. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection , 2022, ICLR.

[7] L. Ni,et al. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] P. Staar,et al. TableFormer: Table Structure Understanding with Transformers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Hang Su,et al. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR , 2022, ICLR.

[10] Hao Liu,et al. Neural Collaborative Graph Machines for Table Structure Recognition , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Ajoy Mondal,et al. Visual Understanding of Complex Table Structures from Document Images , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[12] Rongrong Ji,et al. Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator , 2021, ACM Multimedia.

[13] Robin Abraham,et al. PubTables-1M: Towards comprehensive table extraction from unstructured documents , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Gui-Song Xia,et al. Parsing Table Structures in the Wild , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Depu Meng,et al. Conditional DETR for Fast Training Convergence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Zhenrong Zhang,et al. Split, embed and merge: An accurate table structure recognizer , 2021, Pattern Recognit..

[17] Dacheng Tao,et al. TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18] Fei Wu,et al. LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment , 2021, ICDAR.

[19] Peng Gao,et al. PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex , 2021, ArXiv.

[20] Zhuowen Tu,et al. Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Muhammad Zeshan Afzal,et al. Guided Table Structure Recognition Through Anchor Optimization , 2021, IEEE Access.

[22] Boxun Li,et al. Efficient DETR: Improving End-to-End Object Detector with Dense Prior , 2021, ArXiv.

[23] Peng Gao,et al. Fast Convergence of DETR with Spatially Modulated Co-Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Jinwen Ma,et al. A Deep Semantic Segmentation Model for Image-based Table Structure Recognition , 2020, 2020 15th IEEE International Conference on Signal Processing (ICSP).

[25] Yiming Yang,et al. Rethinking Transformer-based Set Prediction for Object Detection , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] C. V. Jawahar,et al. Table Structure Recognition using Top-Down and Bottom-Up Cues , 2020, ECCV.

[27] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[28] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[29] Lucian Popa,et al. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30] D. Prasad,et al. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31] Zheng Huang,et al. GFTE: Graph-based Financial Table Extraction , 2020, ICPR Workshops.

[32] Antonio Jimeno-Yepes,et al. Image-based table recognition: data, model, and evaluation , 2019, ECCV.

[33] Brian L. Price,et al. Deep Splitting and Merging for Table Structure Decomposition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[34] Shoaib Ahmed Siddiqui,et al. Rethinking Semantic Segmentation for Table Structure Recognition in Documents , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[35] Shoaib Ahmed Siddiqui,et al. DeepTabStR: Deep Learning based Table Structure Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[36] Yu Fang,et al. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR) , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[37] David S. Rosenberg,et al. Challenges in End-to-End Neural Scientific Table Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[38] Dacheng Tao,et al. ReS2TIM: Reconstruct Syntactic Structures from Table Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[39] Faisal Shafait,et al. Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[40] Lovekesh Vig,et al. TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[41] Heyan Huang,et al. Complicated Table Structure Recognition , 2019, ArXiv.

[42] Faisal Shafait,et al. Rethinking Table Recognition using Graph Neural Networks , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[43] Hye-Young Paik,et al. TEXUS: A unified framework for extracting and understanding tables in PDF documents , 2019, Inf. Process. Manag..

[44] Zhoujun Li,et al. TableBank: Table Benchmark for Image-based Table Detection and Recognition , 2019, LREC.

[45] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[46] Xiaogang Wang,et al. Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[47] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[48] Andreas Dengel,et al. DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[49] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50] Ji Zhang,et al. Relationship Proposal Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[52] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[53] D. Cooke. Split , 2017, The Fairchild Books Dictionary of Fashion.

[54] Serge J. Belongie,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Alexey O. Shigarov,et al. Configurable Table Structure Recognition in Untagged PDF documents , 2016, DocEng.

[56] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Tamir Hassan,et al. ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[60] Giorgio Orsi,et al. A methodology for evaluating algorithms for table understanding in PDF documents , 2012, DocEng '12.

[61] Yalin Wang,et al. Table structure understanding and its performance evaluation , 2004, Pattern Recognit..

[62] Hwee Tou Ng,et al. Learning to Recognize Tables in Free Text , 1999, ACL.

[63] Thomas Kieninger,et al. The T-Recs Table Recognition and Analysis System , 1998, Document Analysis Systems.

[64] Katsuhiko Itonori,et al. Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[65] A. Laurentini,et al. Identifying and understanding tabular material in compound documents , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[66] Dezhi Peng,et al. Complex Table Structure Recognition in the Wild Using Transformer and Identity Matrix-Based Augmentation , 2022, ICFHR.

[67] Fei Yin,et al. Adaptive Scaling for Archival Table Structure Recognition , 2021, ICDAR.