A unified cycle-consistent neural model for text and image retrieval
暂无分享,去创建一个
Rita Cucchiara | H. R. Tavakoli | Lorenzo Baraldi | Hamed R. Tavakoli | Marcella Cornia | Marcella Cornia | L. Baraldi | R. Cucchiara
[1] Ming Zhou,et al. Learning to Collaborate for Question Answering and Asking , 2018, NAACL.
[2] Rita Cucchiara,et al. M-VAD names: a dataset for video captioning with naming , 2018, Multimedia Tools and Applications.
[3] Rita Cucchiara,et al. Towards Cycle-Consistent Models for Text and Image Retrieval , 2018, ECCV Workshops.
[4] Rita Cucchiara,et al. Meshed-Memory Transformer for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Fei Su,et al. Two-stage deep learning for supervised cross-modal retrieval , 2018, Multimedia Tools and Applications.
[6] Wei Liu,et al. Reconstruction Network for Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Wei Wang,et al. Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Qi Wu,et al. FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Yan Huang,et al. Learning Semantic Concepts and Order for Image and Sentence Matching , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Kurt Keutzer,et al. Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.
[12] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[13] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[15] Rita Cucchiara,et al. Explaining digital humanities by aligning images and textual descriptions , 2020, Pattern Recognit. Lett..
[16] Rita Cucchiara,et al. Visual saliency for image captioning in new multimedia services , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).
[17] ZhaoYao,et al. Modality-Invariant Image-Text Embedding for Image-Sentence Matching , 2019 .
[18] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.
[19] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Rita Cucchiara,et al. Aligning Text and Document Illustrations: Towards Visually Explainable Digital Humanities , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).
[21] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[22] Xinlei Chen,et al. Cycle-Consistency for Robust Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[25] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[26] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Rita Cucchiara,et al. Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention , 2017 .
[28] Liwei Wang,et al. Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Yang Liu,et al. Neural Machine Translation with Reconstruction , 2016, AAAI.
[30] Xiaogang Wang,et al. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Xiaogang Wang,et al. Deep Dual Learning for Semantic Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Xirong Li,et al. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction , 2016 .
[34] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Jorma Laaksonen,et al. Paying Attention to Descriptions Generated by Image Captioning Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Lior Wolf,et al. Associating neural word embeddings with deep image representations using Fisher Vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Aviv Eisenschtat,et al. Linking Image and Text with 2-Way Nets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[40] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[41] Jorma Laaksonen,et al. Image and Video Captioning with Augmented Neural Architectures , 2018, IEEE MultiMedia.
[42] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[43] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[44] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[45] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[46] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[48] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Yi Yang,et al. Modality-Invariant Image-Text Embedding for Image-Sentence Matching , 2019, ACM Trans. Multim. Comput. Commun. Appl..
[50] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[51] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[53] Jing Zhang,et al. MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[56] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[57] Lei Zhu,et al. Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval , 2016, Multimedia Tools and Applications.
[58] Michele Nappi,et al. Question action relevance and editing for visual question answering , 2018, Multimedia Tools and Applications.
[59] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[60] Yang Yang,et al. Word-to-region attention network for visual question answering , 2018, Multimedia Tools and Applications.
[61] Martin Engilberge,et al. Finding Beans in Burgers: Deep Semantic-Visual Embedding with Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[62] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[64] Xirong Li,et al. Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.