Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning
暂无分享,去创建一个
Jiebo Luo | Jinhui Tang | Jing Wang | Jiebo Luo | Jinhui Tang | Jing Wang
[1] Yang Yang,et al. Captioning Videos Using Large-Scale Image Corpus , 2017, Journal of Computer Science and Technology.
[2] Andrew Zisserman,et al. Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.
[3] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[6] Ye Yuan,et al. Review Networks for Caption Generation , 2016, NIPS.
[7] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Tao Mei,et al. Exploring Visual Relationship for Image Captioning , 2018, ECCV.
[11] Changming Sun,et al. An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Albert Gordo,et al. Rosetta: Large Scale System for Text Detection and Recognition in Images , 2018, KDD.
[13] Jing Wang,et al. Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training , 2018, AAAI.
[14] Ernest Valveny,et al. Scene Text Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Ernest Valveny,et al. ICDAR 2019 Competition on Scene Text Visual Question Answering , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[16] Yi Yang,et al. Entangled Transformer for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Jie Chen,et al. Attention on Attention for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[19] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[20] Yizhou Yu,et al. Cross-Modal Relationship Inference for Grounding Referring Expressions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Shashank Shekhar,et al. OCR-VQA: Visual Question Answering by Reading Text in Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[22] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Ronghang Hu,et al. TextCaps: a Dataset for Image Captioning with Reading Comprehension , 2020, ECCV.
[24] Chunhua Shen,et al. Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Wei Liu,et al. Recurrent Fusion Network for Image Captioning , 2018, ECCV.
[26] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Tao Mei,et al. Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Trevor Darrell,et al. Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Qi Tian,et al. Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Wenyu Liu,et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.
[31] Hongtao Lu,et al. Look Back and Predict Forward in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[33] Tao Mei,et al. Hierarchy Parsing for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[37] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Junjie Yan,et al. FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[40] Tao Mei,et al. Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation , 2019, IJCAI.
[41] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[42] Ernest Valveny,et al. Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[44] Meng Wang,et al. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.
[46] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.