A Neural Syntactic Language Model

This paper presents a study of using neural probabilistic models in a syntactic based language model. The neural probabilistic model makes use of a distributed representation of the items in the conditioning history, and is powerful in capturing long dependencies. Employing neural network based models in the syntactic based language model enables it to use efficiently the large amount of information available in a syntactic parse in estimating the next word in a string. Several scenarios of integrating neural networks in the syntactic based language model are presented, accompanied by the derivation of the training procedures involved. Experiments on the UPenn Treebank and the Wall Street Journal corpus show significant improvements in perplexity and word error rate over the baseline SLM. Furthermore, comparisons with the standard and neural net based N-gram models with arbitrarily long contexts show that the syntactic information is in fact very helpful in estimating the word string probability. Overall, our neural syntactic based model achieves the best published results in perplexity and WER for the given data sets.

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[4]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[5]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[6]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[7]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[8]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Risto Miikkulainen,et al.  Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[11]  Risto Miikkulainen,et al.  Natural Language Processingwith Modular Neural Networks and Distributed Lexicon , 1991 .

[12]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[13]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[14]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[15]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Sandiway Fong,et al.  Can recurrent neural networks learn natural language grammars? , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[17]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[18]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[19]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[20]  Jerome R. Bellegarda,et al.  A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[21]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[22]  Ciprian Chelba,et al.  A Structured Language Model , 1997, Annual Meeting of the Association for Computational Linguistics.

[23]  Lai-Wan Chan,et al.  How to Design a Connectionist Holistic Parser , 1999, Neural Computation.

[24]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[25]  Anthony Skjellum,et al.  Using MPI: portable parallel programming with the message-passing interface, 2nd Edition , 1999, Scientific and engineering computation series.

[26]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[27]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[28]  James Henderson A Neural Network Parser that Handles Sparse Data , 2000, IWPT.

[29]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[30]  Wei Xu,et al.  Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[31]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[32]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[33]  Peng Xu,et al.  Richer syntactic dependencies for structured language modeling , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[34]  Jun Wu,et al.  Smoothing issues in the structured language model , 2001, INTERSPEECH.

[35]  D.H. Van Uytsel,et al.  Maximum-likelihood training of the PLCG-based language model , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[36]  Mark Johnson,et al.  Robust probabilistic predictive syntactic processing: motivations, models, and applications , 2001 .

[37]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[38]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[40]  Ahmad Emami,et al.  Training Connectionist Models for the Structured Language Model , 2003, EMNLP.

[41]  Ahmad Emami,et al.  Using a connectionist model in a syntactical based language model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[42]  Ahmad Emami Improving a connectionist based syntactical language model , 2003, INTERSPEECH.

[43]  Ahmad Emami,et al.  Exact training of a neural syntactic language model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.