wav2vec: Unsupervised Pre-training for Speech Recognition
暂无分享,去创建一个
Ronan Collobert | Steffen Schneider | Alexei Baevski | Michael Auli | Ronan Collobert | Michael Auli | Alexei Baevski | Steffen Schneider
[1] Sergey Edunov,et al. Pre-trained language model representations for language generation , 2019, NAACL.
[2] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[3] Dario Pavllo,et al. 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] James R. Glass,et al. Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces , 2018, NeurIPS.
[5] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[6] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[7] Philipp Koehn,et al. Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.
[8] Ron J. Weiss,et al. Unsupervised Speech Representation Learning Using WaveNet Autoencoders , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.
[10] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[11] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[13] Ronan Collobert,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[15] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[16] Emmanuel Dupoux,et al. A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings , 2016, SLTU.
[17] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[18] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[19] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[20] Julius Kunze,et al. Transfer Learning for Speech Recognition on a Budget , 2017, Rep4NLP@ACL.
[21] Yoshua Bengio,et al. Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.
[22] Iasonas Kokkinos,et al. Learning Filterbanks from Raw Speech for Phone Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Jian Huang,et al. Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning , 2018, ArXiv.
[24] Yoshua Bengio,et al. Learning Speaker Representations with Mutual Information , 2018, INTERSPEECH.
[25] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[26] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[27] Sanjeev Khudanpur,et al. Investigation of transfer learning for ASR using LF-MMI trained neural networks , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Gabriel Synnaeve,et al. Who Needs Words? Lexicon-Free Speech Recognition , 2019, INTERSPEECH.
[29] Steve J. Young,et al. Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[31] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Aren Jansen,et al. A segmental framework for fully-unsupervised large-vocabulary speech recognition , 2016, Comput. Speech Lang..
[33] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Sanjeev Khudanpur,et al. End-to-end Speech Recognition Using Lattice-free MMI , 2018, INTERSPEECH.
[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Lin-Shan Lee,et al. Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data , 2018, ArXiv.
[38] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[39] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.