Designing Agreement Features for Realization Ranking

This paper shows that incorporating linguistically motivated features to ensure correct animacy and number agreement in an averaged perceptron ranking model for CCG realization helps improve a state-of-the-art baseline even further. Traditionally, these features have been modelled using hard constraints in the grammar. However, given the graded nature of grammaticality judgements in the case of animacy we argue a case for the use of a statistical model to rank competing preferences. Though subject-verb agreement is generally viewed to be syntactic in nature, a perusal of relevant examples discussed in the theoretical linguistics literature (Kathol, 1999; Pollard and Sag, 1994) points toward the heterogeneous nature of English agreement. Compared to writing grammar rules, our method is more robust and allows incorporating information from diverse sources in realization. We also show that the perceptron model can reduce balanced punctuation errors that would otherwise require a post-filter. The full model yields significant improvements in BLEU scores on Section 23 of the CCGbank and makes many fewer agreement errors.

[1]  Michael White,et al.  Hypertagging: Supertagging for Surface Realization with CCG , 2008, ACL.

[2]  Michael White,et al.  A More Precise Analysis of Punctuation for Broad-Coverage Surface Realization with CCG , 2008, COLING 2008.

[3]  Jason Baldridge,et al.  Lexically specified derivational control in combinatory categorial grammar , 2002 .

[4]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[5]  Josef van Genabith,et al.  Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations , 2006, ACL.

[6]  Stephan Oepen,et al.  Maximum Entropy Models for Realization Ranking , 2005 .

[7]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[8]  Josef van Genabith,et al.  Dependency-Based N-Gram Models for General Purpose Sentence Realisation , 2008, COLING.

[9]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[10]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[11]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[12]  Michael White,et al.  Perceptron Reranking for CCG Realization , 2009, EMNLP.

[13]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[14]  Josef van Genabith,et al.  Exploiting Multi-Word Units in History-Based Probabilistic Generation , 2007, EMNLP-CoNLL.

[15]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[16]  Jong-Bok Kim Hybrid agreement in English , 2004 .

[17]  Jason Baldridge,et al.  Coupling CCG and Hybrid Logic Dependency Semantics , 2002, ACL.

[18]  Tibor Kiss,et al.  Agreement and the Syntax-morphology Interface in Hpsg , 1997 .

[19]  Michael White,et al.  Projecting Propbank Roles onto the CCGbank , 2008, LREC.

[20]  金 宗福,et al.  Kim Jong Bok , 1996 .

[21]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[22]  Andreas Kathol,et al.  Studies in Contemporary Phrase Structure Grammar: Agreement and the syntax–morphology interface in HPSG , 2000 .

[23]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.