Linguistically Motivated Complementizer Choice in Surface Realization

This paper shows that using linguistically motivated features for English that-complementizer choice in an averaged perceptron model for classification can improve upon the prediction accuracy of a state-of-the-art realization ranking model. We report results on a binary classification task for predicting the presence/absence of a that-complementizer using features adapted from Jaeger's (2010) investigation of the uniform information density principle in the context of that-mentioning. Our experiments confirm the efficacy of the features based on Jaeger's work, including information density--based features. The experiments also show that the improvements in prediction accuracy apply to cases in which the presence of a that-complementizer arguably makes a substantial difference to fluency or intelligiblity. Our ultimate goal is to improve the performance of a ranking model for surface realization, and to this end we conclude with a discussion of how we plan to combine the local complementizer-choice features with those in the global ranking model.

[1]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[2]  Mark Johnson,et al.  How the Statistical Revolution Changes (Computational) Linguistics , 2009 .

[3]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 2.0 , 1989 .

[4]  Jason Baldridge,et al.  Coupling CCG and Hybrid Logic Dependency Semantics , 2002, ACL.

[5]  Michael White,et al.  Perceptron Reranking for CCG Realization , 2009, EMNLP.

[6]  Sali A. Tagliamonte,et al.  No momentary fancy! The zero ‘complementizer’ in English dialects , 2005, English Language and Linguistics.

[7]  Stephan Oepen,et al.  Maximum Entropy Models for Realization Ranking , 2005 .

[8]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[9]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[10]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[11]  John A. Hawkins,et al.  Why are zero-marked phrases close to their heads? , 2003 .

[12]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 5.2 , 1991 .

[13]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[14]  Rena Torres Cacoullos,et al.  On the persistence of grammar in discourse formulas: a variationist study of that , 2009 .

[15]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.