暂无分享,去创建一个
[1] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[2] Laurent Besacier,et al. Developments of Swahili resources for an automatic speech recognition system , 2012, SLTU.
[3] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[4] Benjamin van Niekerk,et al. Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge , 2020, INTERSPEECH.
[5] James R. Glass,et al. Unsupervised Lexicon Discovery from Acoustic Input , 2015, TACL.
[6] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[7] Chris Dyer,et al. Learning Robust and Multilingual Speech Representations , 2020, FINDINGS.
[8] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[9] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[10] Gabriel Synnaeve,et al. slimIPL: Language-Model-Free Iterative Pseudo-Labeling , 2020, Interspeech.
[11] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[12] Gabriel Synnaeve,et al. Semi-Supervised Speech Recognition via Local Prior Matching , 2020, ArXiv.
[13] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[14] Sarah L. Nesbeitt. Ethnologue: Languages of the World , 1999 .
[15] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[16] Eneko Agirre,et al. Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.
[17] Armand Joulin,et al. Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[19] Lukás Burget,et al. Variational Inference for Acoustic Unit Discovery , 2016, Workshop on Spoken Language Technologies for Under-resourced Languages.
[20] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[21] Ronan Collobert,et al. Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.
[22] Hermann Ney,et al. CTC in the Context of Generalized Full-Sum HMM Training , 2017, INTERSPEECH.
[23] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..
[24] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[25] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.
[26] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[27] Vineel Pratap,et al. Differentiable Weighted Finite-State Transducers , 2020, ArXiv.
[28] Frantisek Grézl,et al. Multilingually trained bottleneck features in spoken language recognition , 2017, Comput. Speech Lang..
[29] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.
[30] MarchandMario,et al. Domain-adversarial training of neural networks , 2016 .
[31] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[32] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[33] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[34] Gabriel Synnaeve,et al. Iterative Pseudo-Labeling for Speech Recognition , 2020, INTERSPEECH.
[35] Shrikanth S. Narayanan,et al. Pykaldi: A Python Wrapper for Kaldi , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] James R. Glass,et al. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.
[37] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[38] J. Werker,et al. Developmental changes in perception of nonnative vowel contrasts. , 1994, Journal of experimental psychology. Human perception and performance.
[39] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[40] Karen Livescu,et al. An embedded segmental K-means model for unsupervised segmentation and clustering of speech , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[41] Chengzhu Yu,et al. Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching , 2018, ICLR.
[42] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[43] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[45] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[46] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[47] James R. Glass,et al. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[48] G. Zweig,et al. Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces , 2020, INTERSPEECH.
[49] Herbert Gish,et al. Unsupervised training of an HMM-based speech recognizer for topic classification , 2009, INTERSPEECH.
[50] James Glass,et al. Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech , 2020, ICLR.
[51] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[52] Michael C. Frank,et al. Unsupervised word discovery from speech using automatic segmentation into syllable-like units , 2015, INTERSPEECH.
[53] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] James R. Glass,et al. Towards Visually Grounded Sub-word Speech Unit Discovery , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[56] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[57] Solomon Teferra Abate,et al. Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic , 2014, Speech Commun..
[58] Joseph Keshet,et al. Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation , 2020, INTERSPEECH.
[59] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[60] Sanjeev Khudanpur,et al. Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.
[61] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[62] P. Jusczyk,et al. Clauses are perceptual units for young infants , 1987, Cognition.
[63] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[64] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[65] Yoshua Bengio,et al. Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.
[66] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.
[67] Ruslan Salakhutdinov,et al. Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[69] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.
[70] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[71] Aren Jansen,et al. A segmental framework for fully-unsupervised large-vocabulary speech recognition , 2016, Comput. Speech Lang..
[72] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[73] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[74] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[75] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.
[76] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.
[77] Lin-Shan Lee,et al. Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings , 2018, INTERSPEECH.
[78] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[79] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[80] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[81] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[82] Meng Li,et al. Exploring wav2vec 2.0 on speaker verification and language identification , 2020, Interspeech.
[83] Elizabeth K. Johnson,et al. Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics , 2001 .
[84] Mary R. Newsome,et al. The Beginnings of Word Segmentation in English-Learning Infants , 1999, Cognitive Psychology.
[85] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[86] Kuan-Yu Chen,et al. Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models , 2019 .
[87] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[88] Awni Hannun,et al. Sequence Modeling with CTC , 2017 .
[89] Sanjeev Khudanpur,et al. Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[90] Titouan Parcollet,et al. The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[91] Thierry Moudenc,et al. Speech technologies for african languages: example of a multilingual calculator for education , 2015, INTERSPEECH.
[92] Tomoki Toda,et al. Back-Translation-Style Data Augmentation for end-to-end ASR , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[93] Lukás Burget,et al. Semi-Supervised DNN Training with Word Selection for ASR , 2017, INTERSPEECH.
[94] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[95] Xiangang Li,et al. Improving Transformer-based Speech Recognition Using Unsupervised Pre-training , 2019, ArXiv.
[96] Ondrej Bojar,et al. Improving Translation Model by Monolingual Data , 2011, WMT@EMNLP.
[97] James R. Glass,et al. Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces , 2018, NeurIPS.
[98] Alexei Baevski,et al. Effectiveness of self-supervised pre-training for speech recognition , 2019, ArXiv.
[99] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.
[100] Luciana Ferrer,et al. Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings , 2021, Interspeech.
[101] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[102] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[103] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.
[104] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[105] Awni Hannun,et al. Self-Training for End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[106] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[107] Gabriel Synnaeve,et al. Self-Training and Pre-Training are Complementary for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[108] Juan Pino,et al. Large-Scale Self- and Semi-Supervised Learning for Speech Translation , 2021, Interspeech.
[109] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[110] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[111] S. Young. Large Vocabulary Continuous Speech Recognition : a ReviewSteve , 1996 .
[112] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.
[113] J. Werker,et al. Cross-language speech perception: Evidence for perceptual reorganization during the first year of life , 1984 .
[114] Solomon Teferra Abate,et al. An Amharic speech corpus for large vocabulary continuous speech recognition , 2005, INTERSPEECH.
[115] Jian Wang,et al. Neural Network Language Modeling with Letter-Based Features and Importance Sampling , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[116] Zheng-Hua Tan,et al. rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method , 2020, Comput. Speech Lang..
[117] Hung-yi Lee,et al. Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[118] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .