Bridge the gap between statistical and hand-crafted grammars

LTAG is a rich formalism for performing NLP tasks such as semantic interpretation, parsing, machine translation and information retrieval. Depend on the specific NLP task, different kinds of LTAGs for a language may be developed. Each of these LTAGs is enriched with some specific features such as semantic representation and statistical information that make them suitable to be used in that task. The distribution of these capabilities among the LTAGs makes it difficult to get the benefit from all of them in NLP applications. This paper discusses a statistical model to bridge between two kinds LTAGs for a natural language in order to benefit from the capabilities of both kinds. To do so, an HMM was trained that links an elementary tree sequence of a source LTAG onto an elementary tree sequence of a target LTAG. Training was performed by using the standard HMM training algorithm called Baum-Welch. To lead the training algorithm to a better solution, the initial state of the HMM was also trained by a novel EM-based semi-supervised bootstrapping algorithm. The model was tested on two English LTAGs, XTAG (XTAG-Group, 2001) and MICA's grammar (Bangalore et al., 2009) as the target and source LTAGs, respectively. The empirical results confirm that the model can provide a satisfactory way for linking these LTAGs to share their capabilities together.

[1]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Heshaam Faili From Partial toward Full Parsing , 2009, RANLP.

[4]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[5]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[6]  Alexis Nasr,et al.  MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note) , 2009, HLT-NAACL.

[7]  Heshaam Faili,et al.  Constructing Linguistically Motivated Structures from Statistical Grammars , 2011, RANLP.

[8]  Aravind K. Joshi,et al.  Natural language parsing: Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? , 1985 .

[9]  Heshaam Faili,et al.  An Unsupervised Approach for Linking Automatically Extracted and Manually Crafted LTAGs , 2011, CICLing.

[10]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[11]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[12]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[13]  Martha Palmer,et al.  Integrating compositional semantics into a verb lexicon , 2000, COLING.

[14]  Vijay K. Shanker,et al.  Towards efficient statistical parsing using lexicalized grammatical information , 2002 .

[15]  Fei Xia,et al.  Evaluating the Coverage of LTAGs on Annotated Corpora , 2009 .

[16]  Richard C. Waters,et al.  Tree Insertion Grammar: A Cubic-Time, Parsable Formalism that Lexicalizes Context-Free Grammar without Changing the Trees Produced , 1995, CL.

[17]  Fei Xia,et al.  Automatic grammar generation from two different perspectives , 2001 .

[18]  Laura Kallmeyer,et al.  Parsing Beyond Context-Free Grammars , 2010, Cognitive Technologies.

[19]  Anoop Sarkar Combining Supertagging and Lexicalized Tree-Adjoining Grammar Parsing∗ , 2006 .

[20]  Owen Rambow,et al.  The Hidden TAG Model: Synchronous Grammars for Parsing Resource-Poor Languages , 2006, TAG.

[21]  Heshaam Faili,et al.  Augmenting the automated extracted tree adjoining grammars by semantic representation , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[22]  Neville Ryant,et al.  Assigning XTAG Trees to VerbNet , 2004, TAG+.

[23]  Heshaam Faili,et al.  An Application of Lexicalized Grammars in English-Persian Translation , 2004, ECAI.

[24]  George Cybenko,et al.  Efficient computation of the hidden Markov model entropy for a given observation sequence , 2005, IEEE Transactions on Information Theory.

[25]  Laura Kallmeyer Parsing Range Concatenation Grammars , 2010 .

[26]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .