Semi-supervised CCG Lexicon Extension

This paper introduces Chart Inference (CI), an algorithm for deriving a CCG category for an unknown word from a partial parse chart. It is shown to be faster and more precise than a baseline brute-force method, and to achieve wider coverage than a rule-based system. In addition, we show the application of CI to a domain adaptation task for question words, which are largely missing in the Penn Treebank. When used in combination with self-training, CI increases the precision of the baseline StatCCG parser over subject-extraction questions by 50%. An error analysis shows that CI contributes to the increase by expanding the number of category types available to the parser, while self-training adjusts the counts.

[1]  Kim K. Baldridge,et al.  Adapting Chart Realization to CCG , 2003, ENLG@EACL.

[2]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[3]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[4]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[5]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[6]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[7]  Xuchen Yao,et al.  An Inference-rules based Categorial Grammar Learner for Simulating Language Acquisition , 2009 .

[8]  Suresh Manandhar,et al.  Unsupervised Lexical Learning with Categorical Grammars Using the LLL Corpus , 2001, Learning Language in Logic.

[9]  James R. Curran,et al.  Log-Linear Models for Wide-Coverage CCG Parsing , 2003, EMNLP.

[10]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[11]  Chris Mellish,et al.  Some Chart-Based Techniques for Parsing Ill-Formed Input , 1989, ACL.

[12]  Xuchen Yao,et al.  Unsupervised Syntax Learning with Categorial Grammars using Inference Rules , 2009 .

[13]  Suresh Manandhar,et al.  Unsupervised Lexical Learning with Categorial Grammars , 1999 .

[14]  S. Manandhar,et al.  Acquisition of Large Scale Categorial Grammar Lexicons , 2001 .

[15]  Xuchen Yao,et al.  Proceedings of The 14th Student Session of the European Summer School for Logic, Language, and Information , 2009 .

[16]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Christos Christodoulopoulos Creating a Natural Logic Inference System with Combinatory Categorial Grammar , 2008 .

[19]  Mark Steedman,et al.  Object-Extraction and Question-Parsing using CCG , 2004, EMNLP.

[20]  Suresh Manandhar,et al.  A psychologically plausible and computationally effective approach to learning syntax , 2001, CoNLL.

[21]  Eugene Charniak,et al.  When is Self-Training Effective for Parsing? , 2008, COLING.

[22]  Stephen Clark,et al.  Constructing a Parser Evaluation Scheme , 2008, CF+CDPE@COLING.

[23]  Stephen Clark,et al.  Adapting a Lexicalized-Grammar Parser to Contrasting Domains , 2008, EMNLP.

[24]  Tsuneaki Kato,et al.  Yet Another Chart-Based Technique for Parsing Ill-Formed Input , 1994, ANLP.