论文信息 - Statistical Language Models Based on Neural Networks

Statistical Language Models Based on Neural Networks

Statistical language models are crucial part of many successful applications, such as automatic speech recognition and statistical machine translation (for example well-known Google Translate). Traditional techniques for estimating these models are based on N gram counts. Despite known weaknesses of N -grams and huge efforts of research communities across many fields (speech recognition, machine translation, neuroscience, artificial intelligence, natural language processing, data compression, psychology etc.), N -grams remained basically the state-of-the-art. The goal of this thesis is to present various architectures of language models that are based on artificial neural networks. Although these models are computationally more expensive than N -gram models, with the presented techniques it is possible to apply them to state-of-the-art systems efficiently. Achieved reductions of word error rate of speech recognition systems are up to 20%, against stateof-the-art N -gram model. The presented recurrent neural network based model achieves the best published performance on well-known Penn Treebank setup. Kĺıčová slova jazykový model, neuronová śıt’, rekurentńı, maximálńı entropie, rozpoznáváńı řeči, komprese dat, umělá inteligence

[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[2] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .

[3] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[4] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[5] D. Rumelhart. Learning internal representations by back-propagating errors , 1986 .

[6] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[8] Bernard Mérialdo,et al. A Dynamic Language Model for Speech Recognition , 1991, HLT.

[9] Ronald Rosenfeld,et al. Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[11] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[12] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[13] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[15] Jürgen Schmidhuber,et al. Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[16] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[17] Dietrich Klakow,et al. Log-linear interpolation of language models , 1998, ICSLP.

[18] Matthew V. Mahoney,et al. Text Compression as a Test for Artificial Intelligence , 1999, AAAI/IAAI.

[19] Douglas L. T. Rohde,et al. Language acquisition in the absence of explicit negative evidence: how important is starting small? , 1999, Cognition.

[20] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[21] Matthew V. Mahoney,et al. Fast Text Compression with Neural Networks , 2000, FLAIRS Conference.

[22] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[23] Wu Chou,et al. Robust decision tree state tying for continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..

[24] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[25] Wei Xu,et al. Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[26] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[27] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[28] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[29] Mikael Bodén,et al. A guide to recurrent neural networks and backpropagation , 2001 .

[30] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[31] Mary P. Harper,et al. The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[32] Ahmad Emami,et al. Exact training of a neural syntactic language model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33] Jean-Luc Gauvain,et al. Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[34] Ahmad Emami,et al. Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[35] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[36] Ahmad Emami,et al. A Neural Syntactic Language Model , 2005, Machine Learning.

[37] Katrin Kirchhoff,et al. Factored Neural Language Models , 2006, NAACL.

[38] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[39] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .

[40] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[41] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[42] Peng Xu,et al. Random forests and the data sparseness problem in language modeling , 2007, Comput. Speech Lang..

[43] S. Legg. Machine super intelligence , 2008 .

[44] Lukás Burget,et al. Morphological random forests for language modeling of inflectional languages , 2008, 2008 IEEE Spoken Language Technology Workshop.

[45] Yoshua Bengio,et al. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[46] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[47] Hynek Hermansky,et al. Combination of strongly and weakly constrained recognizers for reliable detection of OOVS , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48] Sanjeev Khudanpur,et al. Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[49] Frederick Jelinek,et al. Iterative decoding: A novel re-scoring framework for confusion networks , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[50] R. Solomonoff. Machine Learning — Past and Future , 2009 .

[51] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[52] Mary P. Harper,et al. A Joint Language Model With Fine-grain Syntactic Tags , 2009, EMNLP.

[53] Ben Goertzel,et al. Program Representation for General Intelligence , 2009 .

[54] Stanley F. Chen,et al. Shrinking Exponential Language Models , 2009, NAACL.

[55] Back-off language model compression , 2009, INTERSPEECH.

[56] Mikko Kurimo,et al. Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension , 2010, INTERSPEECH.

[57] Jason Weston,et al. Towards Understanding Situated Natural Language , 2010, AISTATS.

[58] Ahmad Emami,et al. Augmented context features for Arabic speech recognition , 2010, INTERSPEECH.

[59] Thorsten Brants,et al. Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.

[60] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[61] Igor Szöke. Hybrid word-subword spoken term detection , 2010 .

[62] Mary P. Harper,et al. Model combination for Speech Recognition using Empirical Bayes Risk minimization , 2010, 2010 IEEE Spoken Language Technology Workshop.

[63] Friedrich Faubel,et al. Within and across sentence boundary language model , 2010, INTERSPEECH.

[64] Brian Kingsbury,et al. The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[65] Kenneth Ward Church,et al. A Fast Re-scoring Strategy to Capture Long-Distance Dependencies , 2011, EMNLP.

[66] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[67] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[68] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[69] Alexandre Allauzen,et al. Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70] Sanjeev Khudanpur,et al. Efficient Subsampling for Training Complex Language Models , 2011, EMNLP.

[71] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[72] Lukás Burget,et al. Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[73] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[74] Sanjeev Khudanpur,et al. Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[75] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[76] Bhuvana Ramabhadran,et al. Hill climbing on speech lattices: A new rescoring framework , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[77] Christopher J. C. Burges,et al. The Microsoft Research Sentence Completion Challenge , 2011 .

[78] Lukás Burget,et al. Out-of-Vocabulary Word Detection and Beyond , 2012, Detection and Identification of Rare Audiovisual Cues.