A Joint Language Model With Fine-grain Syntactic Tags

We present a scalable joint language model designed to utilize fine-grain syntactic tags. We discuss challenges such a design faces and describe our solutions that scale well to large tagsets and corpora. We advocate the use of relatively simple tags that do not require deep linguistic knowledge of the language but provide more structural information than POS tags and can be derived from automatically generated parse trees - a combination of properties that allows easy adoption of this model for new languages. We propose two fine-grain tagsets and evaluate our model using these tags, as well as POS tags and SuperARV tags in a speech recognition task and discuss future directions.

[1]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[4]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[5]  B. Srinivas "Almost parsing" technique for language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Peter A. Heeman,et al.  POS Tags and Decision Trees for Language Modeling , 1999, EMNLP.

[8]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[9]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[10]  Frederick Jelinek,et al.  Structured Language Modeling for Speech Recognition , 2000, ArXiv.

[11]  Mary P. Harper,et al.  The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[12]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[13]  The robustness of an almost-parsing language model given errorful training data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Peng Xu,et al.  Random Forests in Language Modelin , 2004, EMNLP.

[15]  Imed Zitouni,et al.  Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition , 2007, Comput. Speech Lang..

[16]  Mary P. Harper,et al.  Measuring tagging performance of a joint language model , 2009, INTERSPEECH.

[17]  Mary P. Harper,et al.  Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.