Automatic Speech Recognition Based on Neural Networks
暂无分享,去创建一个
Ralf Schlüter | Patrick Doetsch | Albert Zeyer | Markus Kitza | Kazuki Irie | Zoltán Tüske | Pavel Golik | Tobias Menne | R. Schlüter | Pavel Golik | Zoltán Tüske | Kazuki Irie | Albert Zeyer | P. Doetsch | M. Kitza | T. Menne
[1] Yoshua Bengio,et al. Word-level training of a handwritten word recognizer based on convolutional neural networks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).
[2] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[3] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.
[4] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Jinyu Li,et al. Investigations on hessian-free optimization for cross-entropy training of deep neural networks , 2013, INTERSPEECH.
[6] Florian Metze,et al. Analysis of gender normalization using MLP and VTLN features , 2010, INTERSPEECH.
[7] Hermann Ney,et al. Improvements in RWTH's System for Off-Line Handwriting Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.
[8] Hermann Ney,et al. Multilingual hierarchical MRASTA features for ASR , 2013, INTERSPEECH.
[9] Hermann Ney,et al. Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.
[10] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[11] Wu Chou,et al. Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.
[12] Steve Renals,et al. IPA: improved phone modelling with recurrent neural networks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[13] Fabio Valente,et al. Hierarchical and parallel processing of modulation spectrum for ASR applications , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[15] Martin Karafiát,et al. Convolutive Bottleneck Network features for LVCSR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[16] Hermann Ney,et al. Convolutional neural networks for acoustic modeling of raw time signal in LVCSR , 2015, INTERSPEECH.
[17] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[18] Dong Yu,et al. Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] Noboru Murata,et al. Neural Network with Unbounded Activation Functions is Universal Approximator , 2015, 1505.03654.
[20] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[22] Georg Heigold,et al. Discriminative Training for Automatic Speech Recognition: Modeling, Criteria, Optimization, Implementation, and Performance , 2012, IEEE Signal Processing Magazine.
[23] Tara N. Sainath,et al. Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[24] Simon King,et al. Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian , 2008, INTERSPEECH.
[25] Tanja Schultz,et al. Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..
[26] Hermann Ney,et al. On the Estimation of 'Small' Probabilities by Leaving-One-Out , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[27] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[28] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[30] Hermann Ney,et al. Does the Cost Function Matter in Bayes Decision Rule? , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Hermann Ney,et al. Multilingual features based keyword search for very low-resource languages , 2015, INTERSPEECH.
[32] Mark J. F. Gales,et al. Investigation of back-off based interpolation between recurrent neural network and n-gram language models , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[33] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[34] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[35] Hermann Ney,et al. Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[37] Georg Heigold,et al. Asynchronous stochastic optimization for sequence training of deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.
[39] Kaisheng Yao,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[40] Hynek Hermansky,et al. Cross-lingual and multi-stream posterior features for low resource LVCSR systems , 2010, INTERSPEECH.
[41] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Hermann Ney,et al. Lattice decoding and rescoring with long-Span neural network language models , 2014, INTERSPEECH.
[43] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[44] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[45] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[46] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[47] Fabio Valente,et al. Hierarchical neural networks feature extraction for LVCSR system , 2007, INTERSPEECH.
[48] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[49] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[50] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[51] Richard Lippmann,et al. Review of Neural Networks for Speech Recognition , 1989, Neural Computation.
[52] Reinhold Häb-Umbach,et al. BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[53] Björn W. Schuller,et al. Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling , 2014, INTERSPEECH.
[54] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[55] Hermann Ney,et al. Investigation on log-linear interpolation of multi-domain neural network language model , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Andreas Stolcke,et al. Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[57] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[58] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[59] Hermann Ney,et al. LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition , 2016, INTERSPEECH.
[60] William J. Byrne,et al. Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[61] Tara N. Sainath,et al. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.
[62] Masami Nakamura,et al. A study of English word category prediction based on neutral networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[63] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[64] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[65] Hermann Ney,et al. Returnn: The RWTH extensible training framework for universal recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[66] Hermann Ney,et al. Cross-lingual portability of Chinese and english neural network features for French and German LVCSR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[67] Hermann Ney,et al. Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[68] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.
[69] Hermann Ney,et al. Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models , 2016, INTERSPEECH.
[70] Khe Chai Sim,et al. Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems , 2010, INTERSPEECH.
[71] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[72] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.
[73] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[74] Hermann Ney,et al. A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[75] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[76] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[77] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[78] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[79] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[80] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[81] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[82] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.
[83] Pietro Laface,et al. On the use of a multilingual neural network front-end , 2008, INTERSPEECH.
[84] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.
[85] Yoshua Bengio,et al. Task Loss Estimation for Sequence Prediction , 2015, ArXiv.
[86] Matthias Paulik,et al. Lattice-based training of bottleneck feature extraction neural networks , 2013, INTERSPEECH.
[87] Florian Metze,et al. Distance-aware DNNs for robust speech recognition , 2015, INTERSPEECH.
[88] Hermann Ney,et al. Speaker adaptive joint training of Gaussian mixture models and bottleneck features , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[89] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[90] Hermann Ney,et al. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[91] Anthony J. Robinson,et al. An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.
[92] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[93] Hermann Ney,et al. Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[94] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[95] Hermann Ney,et al. Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages , 2014, INTERSPEECH.
[96] Martin Karafiát,et al. Study of probabilistic and Bottle-Neck features in multilingual environment , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[97] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.
[98] Yifan Gong,et al. Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[99] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[100] Rich Caruana,et al. Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.
[101] Hermann Ney,et al. Investigations on sequence training of neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[102] Frederick Jelinek,et al. Improved clustering techniques for class-based statistical language modeling , 1999 .
[103] Hermann Ney,et al. Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.
[104] Hermann Ney,et al. Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both? , 2012, INTERSPEECH.
[105] Yu Zhang,et al. Integrated adaptation with multi-factor joint-learning for far-field speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[106] Hermann Ney,et al. Mean-normalized stochastic gradient for large-scale deep learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[107] Alex Waibel,et al. Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[108] Kai Feng,et al. Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[109] Hermann Ney,et al. Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.
[110] Yoshua Bengio,et al. ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient , 2014, ArXiv.
[111] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.
[112] Tanja Schultz,et al. Fast bootstrapping of LVCSR systems with multilingual phoneme sets , 1997, EUROSPEECH.
[113] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..
[114] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[115] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[116] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[117] Holger Schwenk,et al. Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.
[118] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[119] Hermann Ney,et al. Multilingual MRASTA features for low-resource keyword search and speech recognition systems , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[120] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[121] Yu Zhang,et al. Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[122] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[123] José A. R. Fonollosa,et al. Smooth Bilingual N-Gram Translation , 2007, EMNLP.
[124] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[125] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[126] Hermann Ney,et al. Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.
[127] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[128] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[129] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[130] Sanjeev Khudanpur,et al. Semi-supervised maximum mutual information training of deep neural network acoustic models , 2015, INTERSPEECH.