Conversation-oriented ASR with multi-look-ahead CBS architecture
暂无分享,去创建一个
[1] Tetsunori Kobayashi,et al. Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction , 2023, 2022 IEEE Spoken Language Technology Workshop (SLT).
[2] Tara N. Sainath,et al. Turn-Taking Prediction for Natural Conversational Speech , 2022, INTERSPEECH.
[3] M. Seltzer,et al. Streaming parallel transducer beam search with fast-slow cascaded encoders , 2022, INTERSPEECH.
[4] Shinji Watanabe,et al. A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[5] Tetsuji Ogawa,et al. An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR , 2021, 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[6] Tara N. Sainath,et al. An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling , 2021, Interspeech.
[7] Julian Chan,et al. Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency , 2021, Interspeech.
[8] Tara N. Sainath,et al. Cascaded Encoders for Unifying Streaming and Non-Streaming ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Tetsunori Kobayashi,et al. Improved Mask-CTC for Non-Autoregressive End-to-End ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Yu Wu,et al. Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Shinji Watanabe,et al. Streaming Transformer Asr With Blockwise Synchronous Beam Search , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[12] Shinji Watanabe,et al. End-to-End ASR with Adaptive Span Self-Attention , 2020, INTERSPEECH.
[13] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[14] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Jonathan Le Roux,et al. Streaming Automatic Speech Recognition with the Transformer Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Shinji Watanabe,et al. Transformer ASR with Contextual Block Processing , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[19] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[21] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[22] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[23] Ke Li,et al. A Time-Restricted Self-Attention Layer for ASR , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[25] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[26] Matt Shannon,et al. Improved End-of-Query Detection for Streaming Speech Recognition , 2017, INTERSPEECH.
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[30] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[31] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[32] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[33] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.