An Unsupervised Approach for Linking Automatically Extracted and Manually Crafted LTAGs

Though the lack of semantic representation of automatically extracted LTAGs is an obstacle in using these formalism, due to the advent of some powerful statistical parsers that were trained on them, these grammars have been taken into consideration more than before. Against of this grammatical class, there are some widely usage manually crafted LTAGs that are enriched with semantic representation but suffer from the lack of efficient parsers. The available representation of latter grammars beside the statistical capabilities of former encouraged us in constructing a link between them. Here, by focusing on the automatically extracted LTAG used by MICA [4] and the manually crafted English LTAG namely XTAG grammar [32], a statistical approach based on HMM is proposed that maps each sequence of former elementary trees onto a sequence of later elementary trees. To avoid of converging the HMM training algorithm in a local optimum state, an EM-based learning process for initializing the HMM parameters were proposed too. Experimental results show that the mapping method can provide a satisfactory way to cover the deficiencies arises in one grammar by the available capabilities of the other.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Martha Palmer,et al.  Integrating compositional semantics into a verb lexicon , 2000, COLING.

[3]  Nizar Habash,et al.  Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank , 2004 .

[4]  C WatersRichard,et al.  Tree insertion grammar , 1995 .

[5]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[6]  Aravind K. Joshi,et al.  Incremental LTAG Parsing , 2005, HLT/EMNLP.

[7]  Heshaam Faili From Partial toward Full Parsing , 2009, RANLP.

[8]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[9]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[10]  Vijay K. Shanker,et al.  Towards efficient statistical parsing using lexicalized grammatical information , 2002 .

[11]  Srinivas Bangalore,et al.  New Models for Improving Supertag Disambiguation , 1999, EACL.

[12]  Fei Xia,et al.  Automatic grammar generation from two different perspectives , 2001 .

[13]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[16]  Pierre Baldi,et al.  Smooth On-Line Learning Algorithms for Hidden Markov Models , 1994, Neural Computation.

[17]  Neville Ryant,et al.  Assigning XTAG Trees to VerbNet , 2004, TAG+.

[18]  Heshaam Faili,et al.  An Application of Lexicalized Grammars in English-Persian Translation , 2004, ECAI.

[19]  Richard C. Waters,et al.  Tree Insertion Grammar: A Cubic-Time, Parsable Formalism that Lexicalizes Context-Free Grammar without Changing the Trees Produced , 1995, CL.

[20]  Jungyeul Park Extraction of Tree Adjoining Grammars from a Treebank for Korean , 2006, ACL.

[21]  Alexis Nasr,et al.  MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note) , 2009, HLT-NAACL.

[22]  Jun'ichi Tsujii,et al.  LiLFes - Towards a Practical HPSG Parser , 1998, COLING-ACL.

[23]  Aravind K. Joshi,et al.  Natural language parsing: Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? , 1985 .

[24]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[25]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[26]  Gertjan van Noord HEAD‐CORNER PARSING FOR TAG , 1994, Comput. Intell..

[27]  Günter Neumann Automatic extraction of stochastic lexicalized tree grammars from treebanks , 1998, TAG+.

[28]  Heshaam Faili,et al.  Augmenting the automated extracted tree adjoining grammars by semantic representation , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .