Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts
暂无分享,去创建一个
[1] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.
[2] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[3] Michael C. Frank,et al. Unsupervised word discovery from speech using automatic segmentation into syllable-like units , 2015, INTERSPEECH.
[4] Xiaohui Zhang,et al. The Kaldi OpenKWS System: Improving Low Resource Keyword Search , 2017, INTERSPEECH.
[5] Aline Villavicencio,et al. Unsupervised Word Segmentation from Speech with Attention , 2018, INTERSPEECH.
[6] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[7] James R. Glass,et al. Towards Visually Grounded Sub-word Speech Unit Discovery , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] David Chiang,et al. An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.
[9] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[10] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[11] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[12] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .
[13] Mark Hasegawa-Johnson,et al. Sparse hidden Markov models for purer clusters , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.
[15] Sam T. Roweis,et al. EM Algorithms for PCA and Sensible PCA , 1997, NIPS 1997.
[16] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.
[17] Aleksei Romanenko,et al. Acoustic Modeling in the STC Keyword Search System for OpenKWS 2016 Evaluation , 2017, SPECOM.
[18] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[19] Karen Livescu,et al. An embedded segmental K-means model for unsupervised segmentation and clustering of speech , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[20] Mark Hasegawa-Johnson,et al. Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts , 2019, INTERSPEECH.
[21] Aren Jansen,et al. Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Gregory Shakhnarovich,et al. Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .
[24] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[25] James R. Glass,et al. Learning Word-Like Units from Joint Audio-Visual Analysis , 2017, ACL.
[26] D. E. Irwin,et al. Minding the clock , 2003 .
[27] Mark Hasegawa-Johnson,et al. Bayesian Models for Unit Discovery on a Very Low Resource Language , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] James R. Glass,et al. Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Thomas L. Griffiths,et al. Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.
[30] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[31] Tanja Schultz,et al. Word segmentation through cross-lingual word-to-phoneme alignment , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[32] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[33] Frantisek Grézl,et al. Multilingually trained bottleneck features in spoken language recognition , 2017, Comput. Speech Lang..
[34] Sebastian Stüker,et al. Breaking the Unwritten Language Barrier: The BULB Project , 2016, SLTU.
[35] Natalia A. Tomashenko,et al. Fast and Accurate OOV Decoder on High-Level Features , 2017, INTERSPEECH.
[36] Richard M. Schwartz,et al. The 2016 BBN Georgian telephone speech keyword spotting system , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.
[39] Sam T. Roweis,et al. EM Algorithms for PCA and SPCA , 1997, NIPS.
[40] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[41] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[42] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[43] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).