Zero-shot Learning for Speech Recognition with Universal Phonetic Model
暂无分享,去创建一个
Florian Metze | Alan W. Black | Xinjian Li | Siddharth Dalmia | David R. Mortensen | A. Black | Florian Metze | Siddharth Dalmia | Xinjian Li
[1] Themos Stafylakis,et al. Zero-shot keyword spotting for visual speech recognition in-the-wild , 2018, ECCV.
[2] Satoshi Nakamura,et al. Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[3] Sharon Goldwater,et al. Multilingual bottleneck features for subword modeling in zero-resource languages , 2018, INTERSPEECH.
[4] Ngoc Thang Vu,et al. Multilingual deep neural network based acoustic modeling for rapid language adaptation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Florian Metze,et al. Sequence-Based Multi-Lingual Low Resource Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Tanja Schultz,et al. Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..
[7] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] James R. Glass,et al. One-shot learning of generative speech concepts , 2014, CogSci.
[9] Daniel Jurafsky,et al. Lexicon-Free Conversational Speech Recognition with Neural Networks , 2015, NAACL.
[10] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[11] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] G. D. Magoulas,et al. Under review as a conference paper at ICLR 2017 , 2017 .
[13] Mark Hasegawa-Johnson,et al. Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach , 2018, SLTU.
[14] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[15] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[16] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.
[17] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[18] Mark J. F. Gales,et al. Data augmentation for low resource languages , 2014, INTERSPEECH.
[19] James R. Glass. Towards unsupervised speech processing , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).
[20] Hui Lin,et al. A study on multilingual acoustic modeling for large vocabulary ASR , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.
[22] P. Ladefoged. A course in phonetics , 1975 .
[23] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Bernt Schiele,et al. Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.
[25] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[26] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[27] A. Waibel,et al. IMPROVING PHONEME SET DISCOVERY FOR DOCUMENTING , 2017 .
[28] Ralf Schlüter,et al. Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Arthur M. Jacobs,et al. Statistical analysis of the bidirectional inconsistency of spelling and sound in French , 1996 .
[30] Tanja Schultz,et al. Integrating multilingual articulatory features into speech recognition , 2003, INTERSPEECH.
[31] John C. Wells,et al. Computer-coding the IPA: a proposed extension of SAMPA , 1995 .
[32] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[33] Ngoc Thang Vu,et al. Multilingual multilayer perceptron for rapid language adaptation between and across language families , 2013, INTERSPEECH.
[34] Hynek Hermansky,et al. Cross-lingual and multi-stream posterior features for low resource LVCSR systems , 2010, INTERSPEECH.
[35] Katrin Kirchhoff. Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments , 1998, ICSLP.
[36] Hervé Bourlard,et al. An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation , 2017, INTERSPEECH.
[37] Lukás Burget,et al. Topic identification of spoken documents using unsupervised acoustic unit discovery , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] P. Lewis. Ethnologue : languages of the world , 2009 .
[39] Chris Dyer,et al. PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors , 2016, COLING.
[40] Kristen Grauman,et al. Zero-shot recognition with unreliable attributes , 2014, NIPS.
[41] Sanjeev Khudanpur,et al. Topic Identification for Speech Without ASR , 2017, INTERSPEECH.
[42] Siddharth Dalmia,et al. Epitran: Precision G2P for Many Languages , 2018, LREC.
[43] Florian Metze,et al. Domain Robust Feature Extraction for Rapid Low Resource ASR Development , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[44] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[45] Rainer Stiefelhagen,et al. Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Karen Livescu,et al. An embedded segmental K-means model for unsupervised segmentation and clustering of speech , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[47] Tanja Schultz,et al. Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[48] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.
[49] Brian Kan-Wing Mak,et al. Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[50] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[52] Naoyuki Kanda,et al. Elastic spectral distortion for low resource speech recognition with deep neural networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[53] Ulrike Mosel,et al. Essentials of language documentation , 2006 .
[54] E. Vajda. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .
[55] Laurent Besacier,et al. Developments of Swahili resources for an automatic speech recognition system , 2012, SLTU.
[56] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[57] Paul Deléglise,et al. TED-LIUM: an Automatic Speech Recognition dedicated corpus , 2012, LREC.
[58] Philip H. S. Torr,et al. An embarrassingly simple approach to zero-shot learning , 2015, ICML.
[59] Geoffrey Zweig,et al. Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.
[60] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.
[61] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[62] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[63] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[64] A. Waibel,et al. Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features , 2016, IWSLT.
[65] Solomon Teferra Abate,et al. Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic , 2014, Speech Commun..
[66] Bogdan Ludusan,et al. Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems , 2014, LREC.
[67] Kenneth Ward Church,et al. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.