Multi-Task Self-Supervised Learning for Robust Speech Recognition
暂无分享,去创建一个
Yoshua Bengio | Mirco Ravanelli | Santiago Pascual | Jianyuan Zhong | Jan Trmal | Pawel Swietojanski | Joao Monteiro
[1] Steve Renals,et al. Multi-level adaptive networks in tandem and hybrid ASR systems , 2013, ICASSP.
[2] Maurizio Omologo,et al. The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[3] Hermann Ney,et al. Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[4] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[5] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[6] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Yoshua Bengio,et al. Interpretable Convolutional Filters with SincNet , 2018, ArXiv.
[9] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.
[10] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[11] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[12] Maurizio Omologo,et al. Contaminated speech training methods for robust DNN-HMM distant speech recognition , 2017, INTERSPEECH.
[13] Ioannis Mitliagkas,et al. Multi-objective training of Generative Adversarial Networks with multiple discriminators , 2019, ICML.
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Richard Socher,et al. Quasi-Recurrent Neural Networks , 2016, ICLR.
[16] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[17] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[20] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.
[21] Yoshua Bengio,et al. Improving Speech Recognition by Revising Gated Recurrent Units , 2017, INTERSPEECH.
[22] Maurizio Omologo,et al. Realistic Multi-Microphone Data Simulation for Distant Speech Recognition , 2016, INTERSPEECH.
[23] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[24] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[26] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Rong Ge,et al. Rethinking learning rate schedules for stochastic optimization , 2018 .
[28] Yoshua Bengio,et al. Learning Speaker Representations with Mutual Information , 2018, INTERSPEECH.
[29] Yoshua Bengio,et al. Mutual Information Neural Estimation , 2018, ICML.
[30] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[31] Titouan Parcollet,et al. The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[33] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[35] Ron J. Weiss,et al. Unsupervised Speech Representation Learning Using WaveNet Autoencoders , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[36] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[37] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[38] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .
[39] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[40] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .