暂无分享,去创建一个
Kyu J. Han | Kilian Q. Weinberger | Yoav Artzi | Kwangyoun Kim | Felix Wu | Jing Pan | Kyu Han | Felix Wu | Yoav Artzi | Kwangyoun Kim | Jing Pan
[1] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[2] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[4] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[5] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[6] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[7] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[8] Kunio Kashino,et al. BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[9] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[10] Ruslan Salakhutdinov,et al. Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[12] Yannick Estève,et al. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.
[13] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[14] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[15] Gabriel Synnaeve,et al. Iterative Pseudo-Labeling for Speech Recognition , 2020, INTERSPEECH.
[16] Jia Guo,et al. MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers , 2021, MMM.
[17] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[18] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[19] E. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .
[20] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[21] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[23] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[24] James Glass,et al. PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation , 2021, ArXiv.
[25] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[26] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[27] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[28] Aäron van den Oord,et al. Multimodal Self-Supervised Learning of General Audio Representations , 2021, ArXiv.
[29] Ross B. Girshick,et al. Fast and Accurate Model Scaling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[31] Emmanuel Dupoux,et al. A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings , 2016, SLTU.
[32] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.
[33] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[34] Lin-Shan Lee,et al. SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering , 2019, INTERSPEECH.
[35] Ronan Collobert,et al. Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.
[36] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Neil Zeghidour,et al. Contrastive Learning of General-Purpose Audio Representations , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[39] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[40] Gabriel Synnaeve,et al. Self-Training and Pre-Training are Complementary for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Juan Pino,et al. Large-Scale Self- and Semi-Supervised Learning for Speech Translation , 2021, Interspeech.
[42] James R. Glass,et al. Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces , 2018, NeurIPS.
[43] Alexei Baevski,et al. Effectiveness of self-supervised pre-training for speech recognition , 2019, ArXiv.
[44] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[45] Luciana Ferrer,et al. Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings , 2021, Interspeech.
[46] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[48] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[51] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[52] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[53] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[54] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[55] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[57] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[58] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[60] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[61] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Guangsen Wang,et al. Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks , 2020, INTERSPEECH.
[63] Daniel Yang,et al. A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining , 2021, Applied Sciences.
[64] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[65] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.