暂无分享,去创建一个
[1] James Glass,et al. Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech , 2020, ICLR.
[2] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[3] Michael Roth,et al. Visually grounded cross-lingual keyword spotting in speech , 2018, SLTU.
[4] Mirjam Ernestus,et al. Language learning using Speech to Image retrieval , 2019, INTERSPEECH.
[5] Felix Hill,et al. Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.
[6] Claire Cardie,et al. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.
[7] Allan Jabri,et al. Learning Visually Grounded Sentence Representations , 2018, NAACL.
[8] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[9] Danny Merkx,et al. Learning semantic sentence representations from visually grounded language without lexical knowledge , 2019, Natural Language Engineering.
[10] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[14] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[15] Douwe Kiela,et al. SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.
[16] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Ray Kurzweil,et al. Learning Semantic Textual Similarity from Conversations , 2018, Rep4NLP@ACL.
[18] Laurent Besacier,et al. Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech , 2019, CoNLL.
[19] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[20] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.
[21] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[22] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[23] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[24] James R. Glass,et al. Text-Free Image-to-Speech Synthesis Using Learned Segmental Units , 2020, ACL.
[25] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.
[26] Alex Pentland,et al. Learning words from natural audio-visual input , 1998, ICSLP.