暂无分享,去创建一个
Tara N. Sainath | Bhuvana Ramabhadran | Aren Jansen | Quoc V. Le | Khe Chai Sim | William Chan | Jiahui Yu | Chung-Cheng Chiu | Yonghui Wu | Ruoming Pang | Franccoise Beaufays | Joel Shor | Anmol Gulati | Yu Zhang | Zongwei Zhou | Wei Han | Bo Li | James Qin | Yuanzhong Xu | Yanping Huang | Daniel S. Park | Shibo Wang | Min Ma | Yongqiang Wang | Liangliang Cao | Zhifeng Chen
[1] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[2] Gabriel Synnaeve,et al. Iterative Pseudo-Labeling for Speech Recognition , 2020, INTERSPEECH.
[3] Yoshua Bengio,et al. Twin Regularization for online speech recognition , 2018, INTERSPEECH.
[4] Hisashi Kawai,et al. An Investigation of a Knowledge Distillation Method for CTC Acoustic Models , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Jinyu Li,et al. Improved training for online end-to-end speech recognition systems , 2017, INTERSPEECH.
[6] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[7] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[8] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[10] Sree Hari Krishnan Parthasarathi,et al. Lessons from Building Acoustic Models with a Million Hours of Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Stylianos Asteriadis,et al. Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).
[12] Andrew Zisserman,et al. Broaden Your Views for Self-Supervised Video Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Kenneth Ward Church,et al. Deep neural network features and semi-supervised training for low resource speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Alexei Baevski,et al. Effectiveness of self-supervised pre-training for speech recognition , 2019, ArXiv.
[15] Arun Narayanan,et al. Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Salim Hariri,et al. Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion , 2020, INTERSPEECH.
[17] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[18] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[19] Tara N. Sainath,et al. A Better and Faster end-to-end Model for Streaming ASR , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[22] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[23] James R. Glass,et al. Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[25] Paul Deléglise,et al. TED-LIUM: an Automatic Speech Recognition dedicated corpus , 2012, LREC.
[26] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[27] Yannick Estève,et al. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.
[28] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[29] Bhuvana Ramabhadran,et al. Injecting Text in Self-Supervised Speech Pretraining , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[30] George Zavaliagkos,et al. Utilizing untranscribed training data to improve perfomance , 1998, LREC.
[31] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[32] Heung-Seon Oh,et al. Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting , 2021, IEEE Access.
[33] Yuzong Liu,et al. Deep Contextualized Acoustic Representations for Semi-Supervised Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[35] Joel Shor,et al. FRILL: A Non-Semantic Speech Embedding for Mobile Devices , 2021, Interspeech 2021.
[36] Ragini Verma,et al. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset , 2014, IEEE Transactions on Affective Computing.
[37] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.
[38] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Richard M. Schwartz,et al. Analysis of low-resource acoustic model self-training , 2009, INTERSPEECH.
[40] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] H. Zen,et al. Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation , 2021, Interspeech.
[42] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.
[43] Tara N. Sainath,et al. Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling , 2020, ArXiv.
[44] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[45] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[46] Dehao Chen,et al. Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training , 2020, ArXiv.
[47] Philip J. B. Jackson,et al. Speaker-dependent audio-visual emotion recognition , 2009, AVSP.
[48] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[49] George Saon,et al. Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition , 2020, INTERSPEECH.
[50] Tara N. Sainath,et al. RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[51] Subhashini Venugopalan,et al. Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases , 2021, Interspeech.
[52] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[53] Awni Hannun,et al. Self-Training for End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[55] Gabriel Synnaeve,et al. Self-Training and Pre-Training are Complementary for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[57] Noam Shazeer,et al. GSPMD: General and Scalable Parallelization for ML Computation Graphs , 2021, ArXiv.
[58] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[59] Mohammad Norouzi,et al. SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network , 2021, ArXiv.
[60] Quoc V. Le,et al. Specaugment on Large Scale Datasets , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[62] Jon Barker,et al. CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings , 2020, 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020).
[63] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[64] Björn W. Schuller,et al. The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks , 2020, INTERSPEECH.
[65] Tara N. Sainath,et al. Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection , 2016, INTERSPEECH.
[66] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[67] Gabriel Synnaeve,et al. Semi-Supervised Speech Recognition via Local Prior Matching , 2020, ArXiv.
[68] Yuri Y. Khokhlov,et al. The STC System for the CHiME-6 Challenge , 2020, 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020).
[69] Tara N. Sainath,et al. Scaling End-to-End Models for Large-Scale Multilingual ASR , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[70] Ellen Riloff,et al. Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.
[71] Jean-Luc Gauvain,et al. Lightly Supervised Acoustic Model Training , 2000 .
[72] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] Frank Rudzicz,et al. On the importance of normative data in speech-based assessment , 2017, ArXiv.
[74] Christopher Joseph Pal,et al. Twin Networks: Matching the Future for Sequence Generation , 2017, ICLR.
[75] Olivier Siohan,et al. Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models , 2021, Interspeech 2021.
[76] Reinhold Haeb-Umbach,et al. Front-end processing for the CHiME-5 dinner party scenario , 2018, 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018).
[77] Kartik Audhkhasi,et al. Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[78] Armand Joulin,et al. Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[79] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[80] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[82] Naoyuki Kanda,et al. Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone , 2021, Interspeech 2021.
[83] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[84] James R. Glass,et al. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.
[85] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.
[86] Chris Dyer,et al. Learning Robust and Multilingual Speech Representations , 2020, FINDINGS.
[87] Aren Jansen,et al. Towards Learning a Universal Non-Semantic Representation of Speech , 2020, INTERSPEECH.
[88] Aäron van den Oord,et al. Multi-Format Contrastive Learning of Audio Representations , 2021, ArXiv.
[89] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[90] Arun Narayanan,et al. Toward Domain-Invariant Speech Recognition via Large Scale Training , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[91] Brian Kingsbury,et al. On the limit of English conversational speech recognition , 2021, Interspeech.
[92] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[93] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[94] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[95] Tara N. Sainath,et al. Semi-supervised Training for End-to-end Models via Weak Distillation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[96] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[97] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[98] Govind Mittal,et al. Spoken Language Identification using ConvNets , 2019, AmI.
[99] Gabriel Synnaeve,et al. Rethinking Evaluation in ASR: Are Our Models Robust Enough? , 2020, Interspeech.
[100] Ron J. Weiss,et al. Unsupervised Speech Representation Learning Using WaveNet Autoencoders , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.