An Extended Evaluation of the Impact of Different Modules in ST-VQA Systems

[1]  Ernest Valveny,et al.  Scene Text Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shashank Shekhar,et al.  OCR-VQA: Visual Question Answering by Reading Text in Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[5]  Matthieu Cord,et al.  MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Mickaël Coustaty,et al.  Semantic Text Recognition via Visual Question Answering , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Shashank Shekhar,et al.  From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Gaofeng Meng,et al.  Scene text detection and recognition with advances in deep learning: a survey , 2019, International Journal on Document Analysis and Recognition (IJDAR).

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Xinlei Chen,et al.  Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).