EATEN: Entity-Aware Attention for Single Shot Visual Text Extraction

Extracting Text of Interest (ToI) from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts. Most of the existing works employ complicated engineering pipeline, which contains OCR and structure information extraction, to fulfill this task. This paper proposes an Entity-aware Attention Text Extraction Network called EATEN, which is an end-to-end trainable system to extract the ToIs without any post-processing. In the proposed framework, each entity is parsed by its corresponding entity-aware decoder, respectively. Moreover, we innovatively introduce a state transition mechanism which further improves the robustness of visual ToI extraction. In consideration of the absence of public benchmarks, we construct a dataset of almost 0.6 million images in three real-world scenarios (train ticket, passport and business card), which is publicly available at https://github.com/beacandler/EATEN. To the best of our knowledge, EATEN is the first single shot method to extract entities from images. Extensive experiments on these benchmarks demonstrate the state-of-the-art performance of EATEN.

[1]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Kevin Murphy,et al.  Attention-Based Extraction of Structured Information from Street View Imagery , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[3]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Evgeniy Bart,et al.  Information extraction by finding repeated structure , 2010, DAS '10.

[5]  Wei Chu,et al.  A Novel Integrated Framework for Learning both Text Detection and Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[6]  Anders Brun,et al.  Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[9]  Roy Shilkrot,et al.  Visual-Linguistic Methods for Receipt Field Recognition , 2018, ACCV.

[10]  Daniel Kifer,et al.  Guided Attention for Large Scale Scene Text Verification , 2018, ArXiv.

[11]  Peng Wang,et al.  Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition , 2018, AAAI.

[12]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[13]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Eric Saund,et al.  Receipts2Go: the big world of small documents , 2012, DocEng '12.

[15]  Dimosthenis Karatzas,et al.  Single Shot Scene Text Retrieval , 2018, ECCV.

[16]  Errui Ding,et al.  TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network , 2018, ACCV.

[17]  Yolande Belaïd,et al.  Case-Based Reasoning for Invoice Analysis and Recognition , 2007, ICCBR.

[18]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[19]  Ales Horák,et al.  Recognition of OCR Invoice Metadata Block Types , 2018, TSD.

[20]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andreas Dengel,et al.  Seizing the Treasure: Transferring Knowledge in Invoice Analysis , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[22]  Errui Ding,et al.  Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Errui Ding,et al.  Detecting Text in the Wild with Deep Character Embedding Network , 2018, ACCV.

[24]  Yusuf Sinan Akgül,et al.  A Part based Modeling Approach for Invoice Parsing , 2016, VISIGRAPP.