Global Normalization for Streaming Speech Recognition in a Modular Framework
暂无分享,去创建一个
[1] Jinsong Zhang,et al. Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers , 2021, ArXiv.
[2] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[3] Xiaofeng Liu,et al. Rnn-Transducer with Stateless Prediction Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Cyril Allauzen,et al. Hybrid Autoregressive Transducer (HAT) , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Zhijian Ou,et al. CRF-based Single-stage Acoustic Modeling with CTC Topology , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Chris Dyer,et al. An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search , 2019, NAACL.
[8] Tom Bagby,et al. End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow , 2017, INTERSPEECH.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[11] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[12] Slav Petrov,et al. Globally Normalized Transition-Based Neural Networks , 2016, ACL.
[13] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[14] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Mehryar Mohri,et al. On the Disambiguation of Weighted Automata , 2014, CIAA.
[17] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[18] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[19] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[20] Geoffrey Zweig,et al. A segmental CRF approach to large vocabulary continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[21] Steve Renals,et al. Speech Recognition Using Augmented Conditional Random Fields , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[22] Mehryar Mohri,et al. Weighted Automata Algorithms , 2009 .
[23] Mehryar Mohri,et al. Speech Recognition with Weighted Finite-State Transducers , 2008 .
[24] Noah A. Smith,et al. Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.
[25] Johan Schalkwyk,et al. OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.
[26] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[27] Alex Acero,et al. Hidden conditional random fields for phone classification , 2005, INTERSPEECH.
[28] Mehryar Mohri,et al. Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..
[29] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[30] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[31] Roni Rosenfeld,et al. A whole sentence maximum entropy language model , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[32] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[33] Yoshua Bengio,et al. Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[34] J. S. Bridle,et al. An Alphanet approach to optimising input transformations for continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[35] Hervé Bourlard,et al. Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[36] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .
[37] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.