An end-to-end deep learning model to recognize Farsi speech from raw input
暂无分享,去创建一个
[1] Dimitri Palaz,et al. Analysis of CNN-based speech recognition system using raw speech as input , 2015, INTERSPEECH.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[5] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[6] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[9] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[10] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[11] Hsiao-Wuen Hon,et al. Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..
[12] Sanjeev Khudanpur,et al. Acoustic Modelling from the Signal Domain Using CNNs , 2016, INTERSPEECH.
[13] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[14] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[15] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[16] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .
[17] Zhiheng Huang,et al. Residual Convolutional CTC Networks for Automatic Speech Recognition , 2017, ArXiv.
[18] M Bijankhan,et al. FARSDAT- THE SPEECH DATABASE OF FARSI SPOKEN LANGUAGE , 1994 .
[19] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[20] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[21] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .
[22] Lawrence K. Saul,et al. Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[23] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[24] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.