E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
暂无分享,去创建一个
Tara N. Sainath | Rohit Prabhavalkar | David Rybach | Cyril Allauzen | Zhiyun Lu | W. R. Huang | Shuo-yiin Chang | Cal Peyser | W. R. Huang
[1] R. Maas,et al. VADOI: Voice-Activity-Detection Overlapping Inference for End-To-End Long-Form Speech Recognition , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Liang Lu,et al. Endpoint Detection for Streaming End-to-End Multi-Talker ASR , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Tara N. Sainath,et al. Tied & Reduced RNN-T Decoder , 2021, Interspeech.
[4] Tara N. Sainath,et al. An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling , 2021, Interspeech.
[5] Sree Hari Krishnan Parthasarathi,et al. Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models , 2021, TDS.
[6] Tara N. Sainath,et al. Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Tara N. Sainath,et al. FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Tara N. Sainath,et al. RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[9] Rohit Prabhavalkar,et al. Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition , 2021, ArXiv.
[10] Meng Li,et al. Long-Running Speech Recognizer: An End-to-End Multi-Task Learning Framework for Online ASR and VAD , 2021, ArXiv.
[11] Tara N. Sainath,et al. Low Latency Speech Recognition Using End-to-End Prefetching , 2020, INTERSPEECH.
[12] Tara N. Sainath,et al. Towards Fast and Accurate Streaming End-To-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Cyril Allauzen,et al. Hybrid Autoregressive Transducer (HAT) , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Li-Rong Dai,et al. Segment boundary detection directed attention for online end-to-end speech recognition , 2020, EURASIP J. Audio Speech Music. Process..
[16] Joon-Hyuk Chang,et al. End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition , 2020, IEEE Access.
[17] Hagen Soltau,et al. Monotonic Recurrent Neural Network Transducer and Decoding Strategies , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Tara N. Sainath,et al. Recognizing Long-Form Speech Using Streaming End-to-End Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[20] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[22] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[23] Roland Maas,et al. Combining Acoustic Embeddings and Decoding Features for End-of-Utterance Detection in Real-Time Far-Field Speech Recognition Systems , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Zulfiqar Ali,et al. Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments , 2018, IEEE Access.
[25] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Hagen Soltau,et al. Reducing the computational complexity for whole word models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] Matt Shannon,et al. Improved End-of-Query Detection for Streaming Speech Recognition , 2017, INTERSPEECH.
[28] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Tara N. Sainath,et al. Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection , 2016, INTERSPEECH.
[31] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[32] Yifan Gong,et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[33] Juan Manuel Górriz,et al. Voice Activity Detection. Fundamentals and Speech Recognition System Robustness , 2007 .