Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Previous work has shown that learning sequence models for CCG tagging can be improved by using priors that are sensitive to the formal properties of CCG as well as cross-linguistic universals. We extend this approach to the task of learning a full CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.

[1]  Jason Baldridge,et al.  Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[2]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[3]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[4]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[5]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[6]  Mark Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[7]  Jason Baldridge,et al.  Non-Transformational Syntax: Formal and Explicit Models of Grammar , 2011 .

[8]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[9]  Jason Baldridge Weakly Supervised Supertagging with Grammar-Informed Initialization , 2008, COLING.

[10]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[11]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[12]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[13]  Noah A. Smith,et al.  Weakly-Supervised Bayesian Learning of a CCG Supertagger , 2014, CoNLL.

[14]  Jason Baldridge,et al.  DotCCG and VisCCG: Wiki and Programming Paradigms for Improved Grammar Engineering with OpenCCG , 2007 .

[15]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[16]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[17]  Cristina Bosco,et al.  Converting a dependency treebank to a categorial grammar treebank for Italian , 2009 .

[18]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[19]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[20]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[21]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[22]  Valentin I. Spitkovsky,et al.  Punctuation: Making a Point in Unsupervised Dependency Parsing , 2011, CoNLL.

[23]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[24]  Mark Steedman,et al.  A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.

[25]  Kersti Börjars,et al.  Non-Transformational Syntax: Formal and Explicit Models of Grammar , 2011 .

[26]  Jason Baldridge,et al.  Lexically specified derivational control in combinatory categorial grammar , 2002 .

[27]  James R. Curran,et al.  Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank , 2010, COLING.

[28]  Adam Lopez,et al.  Using Categorial Grammar to Label Translation Rules , 2012, WMT@NAACL-HLT.

[29]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.