论文信息 - Evaluating Performance and Accuracy Improvements for Attention-OCR

Evaluating Performance and Accuracy Improvements for Attention-OCR

In this paper we evaluated a set of potential improvements to the successful Attention-OCR architecture, designed to predict multiline text from unconstrained scenes in real-world images. We investigated the impact of several optimizations on model’s accuracy, including employing dynamic RNNs (Recurrent Neural Networks), scheduled sampling, BiLSTM (Bidirectional Long Short-Term Memory) and a modified attention model. BiLSTM was found to slightly increase the accuracy, while dynamic RNNs and a simpler attention model provided a significant training time reduction with only a slight decline in accuracy.

Adam Przybylek | Adam Brzeski | Kamil Grinholc | Kamil Nowodworski

[1] David Nistér,et al. Linear Time Maximally Stable Extremal Regions , 2008, ECCV.

[2] Philip S. Yu,et al. PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning , 2018, ICML.

[3] Mika Liukkonen,et al. Toward decentralized intelligence in manufacturing: recent trends in automatic identification of things , 2016 .

[4] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Karolina Przybylek,et al. Crowd Counting á la Bourdieu , 2019, ADBIS.

[6] Chris Dyer,et al. Differentiable Scheduled Sampling for Credit Assignment , 2017, ACL.

[7] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[8] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[9] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[10] Christoph Meinel,et al. STN-OCR: A single Neural Network for Text Detection and Text Recognition , 2017, ArXiv.

[11] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[14] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15] Kevin Murphy,et al. Attention-Based Extraction of Structured Information from Street View Imagery , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[16] Hamid R. Arabnia,et al. OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym , 2016, ISVC.

[17] Chunhua Shen,et al. Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[19] Xin Wang,et al. An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis , 2017, INTERSPEECH.

[20] Ranjith Unnikrishnan,et al. End-to-End Interpretation of the French Street Name Signs Dataset , 2016, ECCV Workshops.

[21] Xiang Bai,et al. Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yingli Tian,et al. Assistive Text Reading from Natural Scene for Blind Persons , 2015, Mobile Cloud Visual Media Computing.

[23] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Ole Winther,et al. Recurrent Spatial Transformer Networks , 2015, ArXiv.

[25] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27] Lexing Xie,et al. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.