Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
暂无分享,去创建一个
Tara N. Sainath | Ke Hu | Rohit Prabhavalkar | Golan Pundak | Antoine Bruguier | Rohit Prabhavalkar | G. Pundak | A. Bruguier | Ke Hu
[1] Tara N. Sainath,et al. Shallow-Fusion End-to-End Contextual Biasing , 2019, INTERSPEECH.
[2] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[3] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[4] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[7] David Li,et al. Cross-Lingual Phoneme Mapping for Language Robust Contextual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Patrick Nguyen,et al. Model Unit Exploration for Sequence-to-Sequence Speech Recognition , 2019, ArXiv.
[9] Tara N. Sainath,et al. Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search , 2018, INTERSPEECH.
[10] Tara N. Sainath,et al. Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tara N. Sainath,et al. No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Brian Roark,et al. Composition-based on-the-fly rescoring for salient n-gram biasing , 2015, INTERSPEECH.
[15] Tara N. Sainath,et al. Deep Context: End-to-end Contextual Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[16] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[17] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[19] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[20] Brian Roark,et al. Bringing contextual information to google speech recognition , 2015, INTERSPEECH.
[21] Rohit Prabhavalkar,et al. On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition , 2019, INTERSPEECH.
[22] Antoine Bruguier,et al. Pronunciation Learning with RNN-Transducers , 2017, INTERSPEECH.
[23] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).