A multimodal attention fusion network with a dynamic vocabulary for TextVQA
暂无分享,去创建一个
Jun Du | Jiajia Wu | Lirong Dai | Bing Yin | Chen Yang | Jianshu Zhang | Fengren Wang | Xinzhe Jiang | Jinshui Hu | Lirong Dai | Jun Du | Fengren Wang | Jiajia Wu | Bing Yin | Jinshui Hu | Chen Yang | Jianshu Zhang | Xinzhe Jiang
[1] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Xiang Bai,et al. Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[4] C. V. Jawahar,et al. Image Retrieval Using Textual Cues , 2013, 2013 IEEE International Conference on Computer Vision.
[5] Shuchang Zhou,et al. EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[7] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[9] Chunheng Wang,et al. End-to-end scene text recognition using tree-structured models , 2014, Pattern Recognit..
[10] Xiang Li,et al. Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Changming Sun,et al. An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Jiebo Luo,et al. VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Shijian Lu,et al. ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.
[15] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[16] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[17] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Trevor Darrell,et al. Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Lianwen Jin,et al. ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[20] Wenyu Liu,et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.
[21] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[22] Jun Du,et al. Radical analysis network for learning hierarchies of Chinese characters , 2020, Pattern Recognit..
[23] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Nick Barnes,et al. Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models , 2020, Pattern Recognit..
[25] Jun Du,et al. TextMountain: Accurate Scene Text Detection via Instance Segmentation , 2018, Pattern Recognit..
[26] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[28] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[29] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[30] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[31] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[33] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[34] Cheng-Lin Liu,et al. Realtime multi-scale scene text detection with scale-based region proposal network , 2020, Pattern Recognit..
[35] Ernest Valveny,et al. Scene Text Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Dimosthenis Karatzas,et al. Single Shot Scene Text Retrieval , 2018, ECCV.
[37] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[38] Shashank Shekhar,et al. OCR-VQA: Visual Question Answering by Reading Text in Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[39] Hanqing Lu,et al. Improving visual question answering using dropout and enhanced question encoder , 2019, Pattern Recognit..
[40] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Marcin Woźniak,et al. DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression , 2021, Pattern Recognit..
[42] Lianwen Jin,et al. A Multi-Object Rectified Attention Network for Scene Text Recognition , 2019, Pattern Recognit..
[43] Jon Almazán,et al. ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.
[44] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Ernest Valveny,et al. ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).
[48] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[49] Jiri Matas,et al. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.
[50] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[51] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[53] Lianwen Jin,et al. Aggregation Cross-Entropy for Sequence Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Ernest Valveny,et al. Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[55] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[56] Xilin Chen,et al. Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Ernest Valveny,et al. ICDAR 2019 Competition on Scene Text Visual Question Answering , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).