Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese
暂无分享,去创建一个
Laurent Besacier | Jean-Pierre Chevrot | William Havard | William N. Havard | L. Besacier | Jean-Pierre Chevrot
[1] Dedre Gentner,et al. Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .
[2] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[3] Florian Schiel,et al. Multilingual processing of speech via web services , 2017, Comput. Speech Lang..
[4] Grzegorz Chrupala,et al. Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.
[5] Akikazu Takeuchi,et al. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset , 2017, ACL.
[6] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[7] Gregory Shakhnarovich,et al. Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.
[8] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.
[9] Grzegorz Chrupala,et al. Encoding of phonology in a recurrent neural model of grounded speech , 2017, CoNLL.
[10] E. Gibson,et al. Principles of Perceptual Learning and Development , 1973 .
[11] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[12] D. Slobin,et al. Studies of child language development , 1973 .
[13] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .
[14] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[15] Olivier Rosec,et al. SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set , 2017, ArXiv.
[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[17] James R. Glass,et al. Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Emmanuel Dupoux,et al. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.
[19] Etsuko Haryu,et al. Use of bound morphemes (noun particles) in word segmentation by Japanese-learning infants , 2016 .
[20] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[21] Frank Keller,et al. Image Pivoting for Learning Multilingual Multimodal Representations , 2017, EMNLP.
[22] Graham Neubig,et al. Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.
[23] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[24] James R. Glass,et al. Learning Word-Like Units from Joint Audio-Visual Analysis , 2017, ACL.
[25] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.