An Extended Evaluation of the Impact of Different Modules in ST-VQA Systems
暂无分享,去创建一个
[1] Ernest Valveny,et al. Scene Text Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Shashank Shekhar,et al. OCR-VQA: Visual Question Answering by Reading Text in Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[5] Matthieu Cord,et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Mickaël Coustaty,et al. Semantic Text Recognition via Visual Question Answering , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).
[7] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[9] Shashank Shekhar,et al. From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Gaofeng Meng,et al. Scene text detection and recognition with advances in deep learning: a survey , 2019, International Journal on Document Analysis and Recognition (IJDAR).
[11] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[12] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).