Syntax-Based Grammaticality Improvement using CCG and Guided Search

Machine-produced text often lacks grammaticality and fluency. This paper studies grammaticality improvement using a syntax-based algorithm based on ccg. The goal of the search problem is to find an optimal parse tree among all that can be constructed through selection and ordering of the input words. The search problem, which is significantly harder than parsing, is solved by guided learning for best-first search. In a standard word ordering task, our system gives a BLEU score of 40.1, higher than the previous result of 33.7 achieved by a dependency-based system.

[1]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[2]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[3]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.

[4]  Adam Lopez,et al.  Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009) , 2009 .

[5]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[8]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[9]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[10]  Stephen Wan,et al.  Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model , 2009, EACL.

[11]  Michael White,et al.  Reining in CCG Chart Realization , 2004, INLG.

[12]  William J. Byrne,et al.  Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices , 2010, COLING.

[13]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[14]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[15]  Michael White,et al.  Hypertagging: Supertagging for Surface Realization with CCG , 2008, ACL.

[16]  Aravind K. Joshi,et al.  LTAG Dependency Parsing with Bidirectional Incremental Construction , 2008, EMNLP.

[17]  Chris Mellish,et al.  Instance-Based Natural Language Generation , 2001, NAACL.

[18]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[19]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[20]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[21]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[22]  Michael White,et al.  Perceptron Reranking for CCG Realization , 2009, EMNLP.