Turn-Taking Prediction for Natural Conversational Speech
暂无分享,去创建一个
Tara N. Sainath | Bo Li | Trevor Strohman | Yanzhang He | Shuo-yiin Chang | Qiao Liang | Chaoyang Zhang
[1] Tara N. Sainath,et al. Tied & Reduced RNN-T Decoder , 2021, Interspeech.
[2] Tara N. Sainath,et al. An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling , 2021, Interspeech.
[3] Daniel J. Liebling,et al. Disfluency Detection with Unlabeled Data and Small BERT Models , 2021, Interspeech.
[4] Tara N. Sainath,et al. FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Jinyu Li,et al. On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition , 2020, INTERSPEECH.
[6] Hermann Ney,et al. A New Training Pipeline for an Improved Neural Transducer , 2020, INTERSPEECH.
[7] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[8] Tara N. Sainath,et al. Towards Fast and Accurate Streaming End-To-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , 2019 .
[10] Bo Li,et al. A Unified Endpointer Using Multitask and Multidomain Training , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[11] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Tatsuya Kawahara,et al. Analysis of Effect and Timing of Fillers in Natural Turn-Taking , 2019, INTERSPEECH.
[13] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Tatsuya Kawahara,et al. Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios , 2018, ICMI.
[15] Tatsuya Kawahara,et al. Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers , 2018, INTERSPEECH.
[16] Roland Maas,et al. Combining Acoustic Embeddings and Decoding Features for End-of-Utterance Detection in Real-Time Far-Field Speech Recognition Systems , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Hiroshi Ishiguro,et al. Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents , 2017, INTERSPEECH.
[19] Tara N. Sainath,et al. Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition , 2017, INTERSPEECH.
[20] Julian Hough,et al. Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems , 2017, INTERSPEECH.
[21] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[22] Tara N. Sainath,et al. Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.
[23] Brian Kingsbury,et al. Improvements to the IBM speech activity detection system for the DARPA RATS program , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[25] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).