Morphological Analysis Can Improve a CCG Parser for English

Because English is a low morphology language, current statistical parsers tend to ignore morphology and accept some level of redundancy. This paper investigates how costly such redundancy is for a lexicalised grammar such as CCG. We use morphological analysis to split verb inflectional suffixes into separate tokens, so that they can receive their own lexical categories. We find that this improves accuracy when the splits are based on correct POS tags, but that errors in gold standard or automatically assigned POS tags are costly for the system. This shows that the parser can benefit from morphological analysis, so long as the analysis is correct.

[1]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[2]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[3]  James R. Curran,et al.  Fully Lexicalising CCGbank with Hat Categories , 2009, EMNLP.

[4]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[5]  Cem H. Bozsahin The Combinatory Morphemic Lexicon , 2002, CL.

[6]  Stuart M. Shieber,et al.  An Introduction to Unification-Based Approaches to Grammar , 1986, CSLI Lecture Notes.

[7]  James R. Curran,et al.  Multi-Tagging for Lexicalized-Grammar Parsing , 2006, ACL.

[8]  Stephan Oepen,et al.  LinGO Redwoods , 2004 .

[9]  Gary Geunbae Lee,et al.  Korean Combinatory Categorial Grammar and Statistical Parsing , 2002, Comput. Humanit..

[10]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[11]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[12]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[13]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[14]  Christopher D. Manning,et al.  LinGO Redwoods A Rich and Dynamic Treebank for HPSG , 2002 .

[15]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.