暂无分享,去创建一个
[1] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[2] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Sergey Rybin,et al. You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation , 2020, 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.
[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[8] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[9] Haizhou Li,et al. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 , 2019, INTERSPEECH.
[10] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[11] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[12] Hung-yi Lee,et al. Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[14] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[16] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[17] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[19] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[20] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[22] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[23] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[24] Gabriel Synnaeve,et al. Iterative Pseudo-Labeling for Speech Recognition , 2020, INTERSPEECH.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Tom Minka,et al. A* Sampling , 2014, NIPS.
[27] Yoshua Bengio,et al. Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.
[28] Karen Livescu,et al. Unsupervised Pre-Training of Bidirectional Speech Encoders via Masked Reconstruction , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Armand Joulin,et al. Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[31] Luke S. Zettlemoyer,et al. Transformers with convolutional context for ASR , 2019, ArXiv.
[32] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[34] Iasonas Kokkinos,et al. Learning Filterbanks from Raw Speech for Phone Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Yoshua Bengio,et al. Multi-Task Self-Supervised Learning for Robust Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[39] James Glass,et al. Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech , 2020, ICLR.
[40] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[41] Xiangang Li,et al. Improving Transformer-based Speech Recognition Using Unsupervised Pre-training , 2019, ArXiv.
[42] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[43] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Alexei Baevski,et al. Effectiveness of self-supervised pre-training for speech recognition , 2019, ArXiv.
[45] P. Lewis. Ethnologue : languages of the world , 2009 .
[46] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] E. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .
[48] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[49] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[50] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[51] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[52] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[53] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[54] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[55] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[56] Ewald van der Westhuizen,et al. Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks , 2019, INTERSPEECH.