暂无分享,去创建一个
[1] George Sterpu,et al. Learning to Count Words in Fluent Speech enables Online Speech Recognition , 2020, ArXiv.
[2] Bhuvana Ramabhadran,et al. Multilingual Speech Recognition with Self-Attention Structured Parameterization , 2020, INTERSPEECH.
[3] Tara N. Sainath,et al. Emitting Word Timings with End-to-End Models , 2020, INTERSPEECH.
[4] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[5] Jonathan Le Roux,et al. An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Han Lu,et al. End-To-End Multi-Talker Overlapping Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Takuya Yoshioka,et al. Advances in Online Audio-Visual Meeting Transcription , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[8] Fang Deng,et al. End-to-End Code-Switching ASR for Low-Resourced Language Pairs , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[9] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] Mohan Li,et al. End-to-end Speech Recognition with Adaptive Computation Steps , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Gil Keren,et al. Alignment Restricted Streaming Recurrent Neural Network Transducer , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[12] Daehyun Kim,et al. Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[13] Lei Xie,et al. Cascade RNN-Transducer: Syllable Based Streaming On-Device Mandarin Speech Recognition with a Syllable-To-Character Converter , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[14] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[15] Cyril Allauzen,et al. Hybrid Autoregressive Transducer (HAT) , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Daehyun Kim,et al. Iterative Compression of End-to-End ASR Model using AutoML , 2020, INTERSPEECH.
[17] Jinyu Li,et al. Improved training for online end-to-end speech recognition systems , 2017, INTERSPEECH.
[18] Horia Cucu,et al. An Evaluation of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[19] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[20] Xiong Xiao,et al. Developing Far-Field Speaker System Via Teacher-Student Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Tara N. Sainath,et al. Recognizing Long-Form Speech Using Streaming End-to-End Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[22] Geoffrey Zweig,et al. Advances in all-neural speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Naoyuki Kanda,et al. On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer , 2020, Interspeech.
[25] Brian Kingsbury,et al. Advancing RNN Transducer Technology for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[27] Yanmin Qian,et al. Exploring Model Units and Training Strategies for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Naoyuki Kanda,et al. Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition , 2021, Interspeech.
[29] Tatsuya Kawahara,et al. Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.
[31] Florian Metze,et al. Towards Context-Aware End-to-End Code-Switching Speech Recognition , 2020, INTERSPEECH.
[32] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[33] Furu Wei,et al. UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset , 2021, 2107.05233.
[34] George Saon,et al. Alignment-Length Synchronous Decoding for RNN Transducer , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Qian Zhang,et al. Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition , 2020, ArXiv.
[36] Tara N. Sainath,et al. Multitask Training with Text Data for End-to-End Speech Recognition , 2020, Interspeech.
[37] Yu Zhang,et al. Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Wei Chu,et al. Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[39] Brian Kingsbury,et al. Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Philip C. Woodland,et al. Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[41] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[42] Jonathan Le Roux,et al. Streaming Automatic Speech Recognition with the Transformer Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Jonathan Le Roux,et al. Transformer-Based Long-Context End-to-End Speech Recognition , 2020, INTERSPEECH.
[44] Shinji Watanabe,et al. Auxiliary Feature Based Adaptation of End-to-end ASR Systems , 2018, INTERSPEECH.
[45] Kartik Audhkhasi,et al. Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation , 2019, INTERSPEECH.
[46] Tara N. Sainath,et al. Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling , 2020, ICLR.
[47] Brian Kingsbury,et al. 4-bit Quantization of LSTM-based Speech Recognition Models , 2021, Interspeech.
[48] Titouan Parcollet,et al. SpeechBrain: A General-Purpose Speech Toolkit , 2021, ArXiv.
[49] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[50] Florian Metze,et al. Subword and Crossword Units for CTC Acoustic Models , 2017, INTERSPEECH.
[51] Jinyu Li,et al. Improving Multilingual Transformer Transducer Models by Reducing Language Confusions , 2021, Interspeech.
[52] Hermann Ney,et al. Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Naoyuki Kanda,et al. Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Shinji Watanabe,et al. Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Liang Lu,et al. Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition , 2017, INTERSPEECH.
[56] Hieu Duy Nguyen,et al. Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition , 2020, INTERSPEECH.
[57] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Chao Weng,et al. Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[60] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.
[62] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[63] Athanasios Mouchtaris,et al. CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition , 2021, Interspeech.
[64] Yifan Gong,et al. An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[65] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[66] Ivan Medennikov,et al. Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription , 2020, INTERSPEECH.
[67] Nicolas Usunier,et al. End-to-End Speech Recognition From the Raw Waveform , 2018, INTERSPEECH.
[68] Yifan Gong,et al. Advancing Connectionist Temporal Classification with Attention Modeling , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[69] Geoffrey Zweig,et al. Benchmarking LF-MMI, CTC And RNN-T Criteria For Streaming ASR , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[70] Yulan Liu,et al. Streaming Multi-Speaker ASR with RNN-T , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[71] Tara N. Sainath,et al. Multilingual Speech Recognition with a Single End-to-End Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[72] Liang Lu,et al. Deep beamforming networks for multi-channel speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[74] Hagen Soltau,et al. Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction , 2021, Interspeech.
[75] Shinji Watanabe,et al. End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[76] Gil Keren,et al. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion , 2021, Interspeech 2021.
[77] Matt Shannon,et al. Optimizing Expected Word Error Rate via Sampling for Speech Recognition , 2017, INTERSPEECH.
[78] Khe Chai Sim,et al. Efficient Implementation of Recurrent Neural Network Transducer in Tensorflow , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[79] Khe Chai Sim,et al. An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models , 2019, INTERSPEECH.
[80] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[81] Tara N. Sainath,et al. A Better and Faster end-to-end Model for Streaming ASR , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[82] Tatsuya Kawahara,et al. Distilling the Knowledge of BERT for Sequence-to-Sequence ASR , 2020, INTERSPEECH.
[83] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[84] Linhao Dong,et al. CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[85] Athanasios Mouchtaris,et al. Phonetically Induced Subwords for End-to-End Speech Recognition , 2021, Interspeech 2021.
[86] Yongqiang Wang,et al. Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR , 2019, INTERSPEECH.
[87] Sheng Zhao,et al. A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems , 2021, Interspeech.
[88] Tara N. Sainath,et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.
[89] Maja Pantic,et al. Audio-Visual Speech Recognition with a Hybrid CTC/Attention Architecture , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[90] Jonathan Le Roux,et al. Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[91] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[92] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[93] Tara N. Sainath,et al. Shallow-Fusion End-to-End Contextual Biasing , 2019, INTERSPEECH.
[94] Tara N. Sainath,et al. BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition , 2021, ArXiv.
[95] Parisa Haghani,et al. Leveraging Language ID in Multilingual End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[96] Andreas Stolcke,et al. Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Bhuvana Ramabhadran,et al. End-to-end speech recognition and keyword search on low-resource languages , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[98] John R. Hershey,et al. Joint CTC/attention decoding for end-to-end speech recognition , 2017, ACL.
[99] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[100] Carlos Busso,et al. End-to-End Audiovisual Speech Recognition System With Multitask Learning , 2021, IEEE Transactions on Multimedia.
[101] Florian Metze,et al. Sequence-Based Multi-Lingual Low Resource Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[102] Gil Keren,et al. Deep Shallow Fusion for RNN-T Personalization , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[103] Richard Socher,et al. Improved Regularization Techniques for End-to-End Speech Recognition , 2017, ArXiv.
[104] Chao Weng,et al. Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition , 2021, Interspeech 2021.
[105] Richard Socher,et al. An Investigation of Phone-Based Subword Units for End-to-End Speech Recognition , 2020, INTERSPEECH.
[106] Ariya Rastrow,et al. Amortized Neural Networks for Low-Latency Speech Recognition , 2021, Interspeech.
[107] Takaaki Hori,et al. Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers , 2021, Interspeech.
[108] Bhuvana Ramabhadran,et al. Mixture of Informed Experts for Multilingual Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[109] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[110] Andreas Schwarz,et al. Improving RNN-T ASR Accuracy Using Context Audio , 2020, Interspeech.
[111] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[112] Rich Caruana,et al. Model compression , 2006, KDD '06.
[113] Zhiheng Huang,et al. Self-attention Networks for Connectionist Temporal Classification in Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[114] Brian Kingsbury,et al. Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio , 2021, Interspeech.
[115] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[116] Daniel S. Park,et al. Efficient Knowledge Distillation for RNN-Transducer Models , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[117] Liang Lu,et al. On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[118] Tetsunori Kobayashi,et al. Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict , 2020, INTERSPEECH.
[119] Hagen Soltau,et al. Joint Speech Recognition and Speaker Diarization via Sequence Transduction , 2019, INTERSPEECH.
[120] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[121] Jonathan Le Roux,et al. MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[122] Shinji Watanabe,et al. End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming , 2020, INTERSPEECH.
[123] Nanyun Peng,et al. Espresso: A Fast End-to-End Neural Speech Recognition Toolkit , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[124] Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription , 2020, ArXiv.
[125] Jonathan Le Roux,et al. Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[126] Tara N. Sainath,et al. Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus , 2020, INTERSPEECH.
[127] Liangliang Cao,et al. Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[128] George Saon,et al. Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition , 2020, INTERSPEECH.
[129] Naoyuki Kanda,et al. Streaming Multi-talker Speech Recognition with Joint Speaker Identification , 2021, Interspeech.
[130] Dushyant Sharma,et al. Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition , 2021, Interspeech 2021.
[131] Hermann Ney,et al. CTC in the Context of Generalized Full-Sum HMM Training , 2017, INTERSPEECH.
[132] Roland Maas,et al. Streaming End-to-End Bilingual ASR Systems with Joint Language Identification , 2020, ArXiv.
[133] Xiao Chen,et al. Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition , 2020, INTERSPEECH.
[134] Jonathan Le Roux,et al. End-To-End Multi-Speaker Speech Recognition With Transformer , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[135] Xiaofei Wang,et al. Serialized Output Training for End-to-End Overlapped Speech Recognition , 2020, INTERSPEECH.
[136] Yonghong Yan,et al. Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[137] Jinyu Li,et al. A Configurable Multilingual Model is All You Need to Recognize All Languages , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[138] Yifan Gong,et al. Towards Code-switching ASR for End-to-end CTC Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[139] Chengyi Wang,et al. Low Latency End-to-End Streaming Speech Recognition with a Scout Network , 2020, INTERSPEECH.
[140] Steve Renals,et al. Adaptation Algorithms for Speech Recognition: An Overview , 2020, ArXiv.
[141] J. Tao,et al. Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition , 2020, INTERSPEECH.
[142] Srikanth Ronanki,et al. Transformer-Transducers for Code-Switched Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[143] Hermann Ney,et al. Returnn: The RWTH extensible training framework for universal recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[144] Ruslan Salakhutdinov,et al. Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[145] Hung-yi Lee,et al. Meta Learning for End-To-End Low-Resource Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[146] Tara N. Sainath,et al. No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[147] Suyoun Kim,et al. Towards Language-Universal End-to-End Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[148] Patrick Nguyen,et al. Model Unit Exploration for Sequence-to-Sequence Speech Recognition , 2019, ArXiv.
[149] Yifan Gong,et al. Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.
[150] Shinji Watanabe,et al. Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios , 2021, Interspeech.
[151] Hagen Soltau,et al. Monotonic Recurrent Neural Network Transducer and Decoding Strategies , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[152] Titouan Parcollet,et al. E2E-SINCNET: Toward Fully End-To-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[153] Jiangyan Yi,et al. Synchronous Transformers for end-to-end Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[154] Frank Zhang,et al. Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition , 2020, ArXiv.
[155] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[156] Puming Zhan,et al. Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems , 2021, Interspeech 2021.
[157] Khe Chai Sim,et al. Robust Continuous On-Device Personalization for Automatic Speech Recognition , 2021, Interspeech.
[158] Shinji Watanabe,et al. Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration , 2019, INTERSPEECH.
[159] Yifan Gong,et al. On Addressing Practical Challenges for RNN-Transducer , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[160] Steve Renals,et al. Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[161] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[162] Yifan Gong,et al. Acoustic-to-word model without OOV , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[163] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[164] Tara N. Sainath,et al. An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling , 2021, Interspeech.
[165] Yashesh Gaur,et al. Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[166] Shinji Watanabe,et al. Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[167] Shinji Watanabe,et al. Recent Developments on Espnet Toolkit Boosted By Conformer , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[168] Puming Zhan,et al. Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR , 2019, INTERSPEECH.
[169] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[170] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[171] I-Fan Chen,et al. Maximum a posteriori adaptation of network parameters in deep models , 2015, INTERSPEECH.
[172] Naomi Harte,et al. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition , 2018, ICMI.
[173] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[174] Shinji Watanabe,et al. Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[175] Tara N. Sainath,et al. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[176] H. H. Mao,et al. Speech Recognition and Multi-Speaker Diarization of Long Conversations , 2020, INTERSPEECH.
[177] Shiliang Zhang,et al. Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition , 2019, INTERSPEECH.
[178] Kaisheng Yao,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[179] Naoyuki Kanda,et al. Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[180] Jinyu Li,et al. Streaming End-to-End Multi-Talker Speech Recognition , 2020, IEEE Signal Processing Letters.
[181] Giovanni Motta,et al. Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[182] Jonathan Le Roux,et al. End-to-End Multi-Speaker Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[183] Naoyuki Kanda,et al. Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[184] Yanmin Qian,et al. Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System , 2019, INTERSPEECH.
[185] Daniel Willett,et al. Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems , 2020, ArXiv.
[186] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[187] Wei Li,et al. Monotonic Infinite Lookback Attention for Simultaneous Machine Translation , 2019, ACL.
[188] Nicolas Usunier,et al. Fully Convolutional Speech Recognition , 2018, ArXiv.
[189] Hao Li,et al. Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts , 2020, INTERSPEECH.
[190] Yashesh Gaur,et al. On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition , 2020, INTERSPEECH.
[191] John R. Hershey,et al. Language independent end-to-end architecture for joint language identification and speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[192] Wei Chu,et al. CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition , 2020, ArXiv.
[193] Tetsuji Ogawa,et al. Improved Mask-CTC for Non-Autoregressive End-to-End ASR , 2020, ArXiv.
[194] Shinji Watanabe,et al. End-to-end Monaural Multi-speaker ASR System without Pretraining , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[195] Jinyu Li,et al. Factorized Neural Transducer for Efficient Language Model Adaptation , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[196] Tara N. Sainath,et al. An Attention-Based Joint Acoustic and Text on-Device End-To-End Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[197] Xiangang Li,et al. Semantic Data Augmentation for End-to-End Mandarin Speech Recognition , 2021, Interspeech.
[198] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[199] Yifan Gong,et al. Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need , 2021, Interspeech.
[200] Tara N. Sainath,et al. Compression of End-to-End Models , 2018, INTERSPEECH.
[201] Gabriel Synnaeve,et al. Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters , 2020, INTERSPEECH.
[202] Tara N. Sainath,et al. Semi-supervised Training for End-to-end Models via Weak Distillation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[203] Naoyuki Kanda,et al. Maximum a posteriori Based Decoding for CTC Acoustic Models , 2016, INTERSPEECH.
[204] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[205] Tara N. Sainath,et al. Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[206] James R. Glass,et al. Combining End-to-End and Adversarial Training for Low-Resource Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[207] Hermann Ney,et al. A Comparison of Transformer and LSTM Encoder Decoder Models for ASR , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[208] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[209] Xiaofeng Liu,et al. Rnn-Transducer with Stateless Prediction Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[210] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[211] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[212] Furu Wei,et al. UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data , 2021, ICML.
[213] Ho-Gyeong Kim,et al. Knowledge Distillation Using Output Errors for Self-attention End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[214] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[215] Naoyuki Kanda,et al. Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition , 2021, ArXiv.
[216] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[217] Tara N. Sainath,et al. FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[218] Maja Pantic,et al. End-To-End Audio-Visual Speech Recognition with Conformers , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[219] Vikas Chandra,et al. Collaborative Training of Acoustic Encoders for Speech Recognition , 2021, Interspeech.
[220] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[221] Atsushi Kojima. Knowledge Distillation for Streaming Transformer-Transducer , 2021, Interspeech.
[222] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[223] Shinji Watanabe,et al. Transformer ASR with Contextual Block Processing , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[224] Davis Liang,et al. Learning Noise-Invariant Representations for Robust Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[225] Ramón Fernández Astudillo,et al. Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text , 2019, INTERSPEECH.
[226] Tara N. Sainath,et al. A Spelling Correction Model for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[227] Tara N. Sainath,et al. Cascaded Encoders for Unifying Streaming and Non-Streaming ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[228] Tara N. Sainath,et al. Transformer Based Deliberation for Two-Pass Speech Recognition , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[229] Yifan Gong,et al. Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition , 2021, ArXiv.
[230] Yashesh Gaur,et al. Continuous Streaming Multi-Talker ASR with Dual-Path Transducers , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[231] Athanasios Mouchtaris,et al. Multi-Channel Transformer Transducer for Speech Recognition , 2021, Interspeech.
[232] Kevin Duh,et al. Multilingual End-to-End Speech Translation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[233] Chengzhu Yu,et al. Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition , 2019, INTERSPEECH.
[234] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[235] Tara N. Sainath,et al. A Deliberation-Based Joint Acoustic and Text Decoder , 2021, Interspeech.
[236] Ding Zhao,et al. Dynamic Sparsity Neural Networks for Automatic Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[237] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[238] Tara N. Sainath,et al. Scaling End-to-End Models for Large-Scale Multilingual ASR , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[239] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[240] Ryo Masumura,et al. Distilling Attention Weights for CTC-Based ASR Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[241] Dong Yu,et al. Recognizing Multi-talker Speech with Permutation Invariant Training , 2017, INTERSPEECH.
[242] Rohit Prabhavalkar,et al. Dissecting User-Perceived Latency of On-Device E2E Speech Recognition , 2021, Interspeech.
[243] Julian Chan,et al. Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency , 2021, Interspeech.
[244] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[245] Tara N. Sainath,et al. A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[246] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[247] Yifan Gong,et al. Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator , 2020, INTERSPEECH.
[248] Adam Coates,et al. Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.
[249] Wei Chen,et al. Modality Attention for End-to-end Audio-visual Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[250] Tara N. Sainath,et al. Learning Word-Level Confidence for Subword End-To-End ASR , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[251] Jun Wang,et al. Improving Attention-Based End-to-End ASR Systems with Sequence-Based Loss Functions , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[252] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[253] Yashesh Gaur,et al. Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[254] Tara N. Sainath,et al. Deep Context: End-to-end Contextual Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[255] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[256] Athanasios Mouchtaris,et al. End-to-End Multi-Channel Transformer for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[257] Yashesh Gaur,et al. Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[258] Shuai Zhang,et al. Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition , 2020, 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[259] Andreas Stolcke,et al. Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition , 2020, INTERSPEECH.
[260] Tara N. Sainath,et al. Improving Performance of End-to-End ASR on Numeric Sequences , 2019, INTERSPEECH.
[261] Samarth Bharadwaj,et al. Multilingual and code-switching ASR challenges for low resource Indian languages , 2021, Interspeech.
[262] Kshitiz Kumar,et al. Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[263] Yashesh Gaur,et al. Speaker Adaptation for Attention-Based End-to-End Speech Recognition , 2019, INTERSPEECH.
[264] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[265] Yu-An Chung,et al. Generative Pre-Training for Speech with Autoregressive Predictive Coding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[266] Gil Keren,et al. Contextual RNN-T For Open Domain ASR , 2020, INTERSPEECH.
[267] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[268] Haizhou Li,et al. Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition , 2020, INTERSPEECH.
[269] Mohan Li,et al. Transformer-Based Online Speech Recognition with Decoder-end Adaptive Computation Steps , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[270] Tatsuya Kawahara,et al. Acoustic-to-Word Attention-Based Model Complemented with Character-Level CTC-Based Model , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[271] Shinji Watanabe,et al. Non-Autoregressive Transformer for Speech Recognition , 2021, IEEE Signal Processing Letters.
[272] Hao Tang,et al. End-to-End Neural Segmental Models for Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[273] Gunnar Evermann,et al. Class LM and word mapping for contextual biasing in End-to-End ASR , 2020, INTERSPEECH.
[274] Steve Renals,et al. Multilingual training of deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[275] Shinji Watanabe,et al. Streaming Transformer Asr With Blockwise Synchronous Beam Search , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[276] Hung-yi Lee,et al. Towards Lifelong Learning of End-to-end ASR , 2021, Interspeech.
[277] Hung-yi Lee,et al. Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation , 2021, FINDINGS.
[278] Vikas Joshi,et al. Transfer Learning Approaches for Streaming End-to-End Speech Recognition System , 2020, INTERSPEECH.
[279] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[280] John R. Hershey,et al. Multichannel End-to-end Speech Recognition , 2017, ICML.
[281] Chng Eng Siong,et al. Speech Transformer with Speaker Aware Persistent Memory , 2020, INTERSPEECH.
[282] Jiangyan Yi,et al. Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition , 2019, INTERSPEECH.
[283] Maurizio Omologo,et al. Speech Recognition with Microphone Arrays , 2001, Microphone Arrays.
[284] Stefan Riezler,et al. On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR , 2021, Interspeech.
[285] Srinivasan Umesh,et al. Investigation of Methods to Improve the Recognition Performance of Tamil-English Code-Switched Data in Transformer Framework , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[286] Yifan Gong,et al. Speaker Adaptation for End-to-End CTC Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[287] Matteo Negri,et al. Adapting Transformer to End-to-End Spoken Language Translation , 2019, INTERSPEECH.
[288] Matt Shannon,et al. Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping , 2017, INTERSPEECH.
[289] Ozlem Kalinli,et al. Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios , 2021, Interspeech.
[290] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[291] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[292] Preethi Jyothi,et al. An Investigation of End-to-End Models for Robust Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[293] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[294] Ehsan Variani,et al. A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[295] Bhiksha Raj,et al. Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors , 2012, IEEE Signal Processing Magazine.
[296] Tatsuya Kawahara,et al. Enhancing Monotonic Multihead Attention for Streaming ASR , 2020, INTERSPEECH.
[297] Kyu J. Han,et al. Multi-mode Transformer Transducer with Stochastic Future Context , 2021, Interspeech.
[298] Bin Ma,et al. Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data , 2019, INTERSPEECH.
[299] Hasim Sak,et al. Reducing Streaming ASR Model Delay with Self Alignment , 2021, Interspeech.
[300] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[301] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[302] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[303] Yashesh Gaur,et al. Combination of End-to-End and Hybrid Models for Speech Recognition , 2020, INTERSPEECH.
[304] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[305] Lin-Shan Lee,et al. Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[306] Yashesh Gaur,et al. Acoustic-to-Phrase Models for Speech Recognition , 2019, INTERSPEECH.
[307] Nikko Strom,et al. Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[308] Hermann Ney,et al. Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition , 2021, Interspeech 2021.
[309] Zhong Meng,et al. Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability , 2020, INTERSPEECH.
[310] Tara N. Sainath,et al. Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[311] Lei Xie,et al. WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit , 2021, Interspeech.
[312] Janne Pylkkönen,et al. Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network , 2021, Interspeech.
[313] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[314] Tara N. Sainath,et al. Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[315] Naoyuki Kanda,et al. Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers , 2020, INTERSPEECH.
[316] Alexander H. Waibel,et al. Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[317] Lei Xie,et al. Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition , 2020, INTERSPEECH.