Examine before You Answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA
暂无分享,去创建一个
Heng Tao Shen | Xianglong Liu | Jingkuan Song | Pengpeng Zeng | Lianli Gao | Jingkuan Song | Xianglong Liu | Lianli Gao | Pengpeng Zeng
[1] Hanzhang Wang,et al. Richer Semantic Visual and Language Representation for Video Captioning , 2017, ACM Multimedia.
[2] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[3] Heng Tao Shen,et al. Deep Region Hashing for Generic Instance Search from Images , 2018, AAAI.
[4] Richard S. Zemel,et al. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.
[5] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Nicu Sebe,et al. Graph-without-cut: An Ideal Graph Learning for Image Segmentation , 2016, AAAI.
[7] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.
[8] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[9] Tao Mei,et al. Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[11] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[12] Zi Huang,et al. Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning , 2017, ACM Multimedia.
[13] Bohyung Han,et al. Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Nicu Sebe,et al. Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.
[17] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Xuelong Li,et al. Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.
[20] Jia Chen,et al. Video Captioning with Guidance of Multimodal Latent Topics , 2017, ACM Multimedia.
[21] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[22] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[24] Lin Ma,et al. Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.
[25] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Jiasen Lu,et al. VQA: Visual Question Answering , 2015, ICCV.
[27] Rongrong Ji,et al. More Than An Answer: Neural Pivot Network for Visual Qestion Answering , 2017, ACM Multimedia.
[28] Ioannis Patras,et al. Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection , 2016, ACM Multimedia.
[29] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[31] Shuicheng Yan,et al. A Focused Dynamic Attention Model for Visual Question Answering , 2016, ArXiv.
[32] Narendra Ahuja,et al. Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.
[33] Huimin Lu,et al. Deep adversarial metric learning for cross-modal retrieval , 2019, World Wide Web.
[34] Yang Yang,et al. Deep Asymmetric Pairwise Hashing , 2017, ACM Multimedia.
[35] Wei Zhang,et al. Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering , 2017, AAAI.
[36] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Wei Liu,et al. Classification by Retrieval: Binarizing Data and Classifiers , 2017, SIGIR.
[38] Saumik Bhattacharya,et al. Visual Saliency Detection Using Spatiotemporal Decomposition , 2018, IEEE Transactions on Image Processing.
[39] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.