Towards Visually Grounded Sub-word Speech Unit Discovery
暂无分享,去创建一个
[1] Aren Jansen,et al. A segmental framework for fully-unsupervised large-vocabulary speech recognition , 2016, Comput. Speech Lang..
[2] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[3] James R. Glass,et al. Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[5] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[6] Kenneth Ward Church,et al. Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.
[7] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[8] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.
[9] Florian Metze,et al. Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the “Speaking Rosetta” JSALT 2017 Workshop , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Sanjeev Khudanpur,et al. Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.
[11] Emmanuel Dupoux,et al. Learning Words from Images and Speech , 2014 .
[12] Gregory Shakhnarovich,et al. Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.
[13] Odette Scharenborg,et al. Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. , 2010, The Journal of the Acoustical Society of America.
[14] Virginia R. de Sa,et al. Learning Classification with Unlabeled Data , 1993, NIPS.
[15] James Glass,et al. Analysis of Audio-Visual Features for Unsupervised Speech Recognition , 2017 .
[16] Herbert Gish,et al. Unsupervised training of an HMM-based speech recognizer for topic classification , 2009, INTERSPEECH.
[17] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[18] Aren Jansen,et al. Weak top-down constraints for unsupervised acoustic model training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] Grzegorz Chrupala,et al. Encoding of phonology in a recurrent neural model of grounded speech , 2017, CoNLL.
[20] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[22] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[23] Lukás Burget,et al. Variational Inference for Acoustic Unit Discovery , 2016, Workshop on Spoken Language Technologies for Under-resourced Languages.
[24] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .
[25] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..
[26] Okko Johannes Räsänen,et al. Blind Phoneme Segmentation With Temporal Prediction Errors , 2016, ACL.
[27] Okko Johannes Räsänen,et al. Basic cuts revisited: Temporal segmentation of speech into phone-like units with statistical learning at a pre-linguistic level , 2014, CogSci.
[28] James R. Glass,et al. Learning Word-Like Units from Joint Audio-Visual Analysis , 2017, ACL.
[29] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] James Bailey,et al. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..
[31] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[32] James R. Glass,et al. Unsupervised Lexicon Discovery from Acoustic Input , 2015, TACL.