Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech
暂无分享,去创建一个
[1] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[2] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..
[3] Danny Merkx,et al. Learning semantic sentence representations from visually grounded language without lexical knowledge , 2019, Natural Language Engineering.
[4] Mirjam Ernestus,et al. Language learning using Speech to Image retrieval , 2019, INTERSPEECH.
[5] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[6] James R. Glass,et al. Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Laurent Besacier,et al. Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech , 2019, CoNLL.
[8] P. Jusczyk,et al. Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.
[9] J. Vroomen,et al. Lexical access of resyllabified words: Evidence from phoneme monitoring , 1999, Memory & cognition.
[10] James R. Glass,et al. Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio , 2019, INTERSPEECH.
[11] Gregory Shakhnarovich,et al. Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.
[12] Florian Schiel,et al. Multilingual processing of speech via web services , 2017, Comput. Speech Lang..
[13] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[14] Laurent Besacier,et al. Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Otto Jespersen,et al. Monosyllabism in English , 1928 .
[16] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[17] Artem Sokolov,et al. Learning to Segment Inputs for NMT Favors Character-Level Processing , 2018, IWSLT.
[18] Hung-yi Lee,et al. Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] James Glass,et al. Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech , 2020, ICLR.
[20] Kunio Kashino,et al. Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[22] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.