暂无分享,去创建一个
[1] Jason Weston,et al. Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] A. D. Manning,et al. Understanding Comics: The Invisible Art , 1993 .
[3] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[4] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[5] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[6] Matthew Stone,et al. CITE: A Corpus of Image-Text Discourse Relations , 2019, NAACL.
[7] Yorick Wilks,et al. NLP for Indexing and Retrieval of Captioned Photographs , 2003, EACL.
[8] J. Hobbs. On the coherence and structure of discourse , 1985 .
[9] Chong-Wah Ngo,et al. Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval , 2018, ACM Multimedia.
[10] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Alex Lascarides,et al. Logics of Conversation , 2005, Studies in natural language processing.
[14] Amaia Salvador,et al. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[16] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .
[17] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[18] Matthew Stone,et al. A Formal Semantic Analysis of Gesture , 2009, J. Semant..
[19] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.
[20] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[21] Maite Taboada,et al. Applications of Rhetorical Structure Theory , 2006 .
[22] Vladimir Pavlovic,et al. CookGAN: Meal Image Synthesis from Ingredients , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[23] Nazli Ikizler-Cinbis,et al. RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes , 2018, EMNLP.
[24] Matthew Stone,et al. Cross-modal Coherence Modeling for Caption Generation , 2020, ACL.
[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[26] Frank Keller,et al. Query-by-Example Image Retrieval using Visual Dependency Representations , 2014, COLING.