Cross-modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Subspace Learning
暂无分享,去创建一个
Hai Xuan Pham | Vladimir Pavlovic | Ricardo Guerrero | V. Pavlovic | Ricardo Guerrero | Hai Xuan Pham
[1] Lada A. Adamic,et al. Recipe recommendation using ingredient networks , 2011, WebSci '12.
[2] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.
[3] Danushka Bollegala,et al. Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA , 2019, ArXiv.
[4] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[5] Chunyan Miao,et al. Structure-Aware Generation Network for Recipe Generation from Images , 2020, ECCV.
[6] Vladimir Pavlovic,et al. CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval , 2021, AAAI.
[7] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.
[10] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[11] Shinsuke Mori,et al. Structure-Aware Procedural Text Generation From an Image Sequence , 2021, IEEE Access.
[12] Amaia Salvador,et al. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Xin Chen,et al. ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition , 2017, ArXiv.
[15] Antonio Torralba,et al. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[18] Saeed Al-Bukhitan,et al. Health, Food and User's Profile Ontologies for Personalized Information Retrieval , 2015, ANT/SEIT.
[19] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.
[20] Matthew Crosby,et al. Association for the Advancement of Artificial Intelligence , 2014 .
[21] Vladimir Pavlovic,et al. CookGAN: Meal Image Synthesis from Ingredients , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[22] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[23] Steven C. H. Hoi,et al. Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Jihun Choi,et al. Learning to Compose Task-Specific Tree Structures , 2017, AAAI.
[25] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26] Bin Zhu,et al. CookGAN: Causality Based Text-to-Image Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Matthieu Cord,et al. Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings , 2018, SIGIR.
[28] Amaia Salvador,et al. Inverse Cooking: Recipe Generation From Food Images , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[32] Michael Donoser,et al. Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Chong-Wah Ngo,et al. Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval , 2018, ACM Multimedia.
[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[35] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[36] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[37] Shinsuke Mori,et al. Procedural Text Generation from a Photo Sequence , 2019, INLG.
[38] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[40] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[41] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[42] Yan Huang,et al. ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[44] Steven C. H. Hoi,et al. Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes With Semantic Consistency and Attention Mechanism , 2020, IEEE Transactions on Multimedia.
[45] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Jianling Sun,et al. MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[48] Hongyu Guo,et al. Long Short-Term Memory Over Recursive Structures , 2015, ICML.
[49] Wei Chen,et al. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).