Something Old, Something New: Grammar-based CCG Parsing with Transformer Models

This report describes the parsing problem for Combinatory Categorial Grammar (CCG), showing how a combination of Transformer-based neural models and a symbolic CCG grammar can lead to substantial gains over existing approaches. The report also documents a 20-year research program, showing how NLP methods have evolved over this time. The staggering accuracy improvements provided by neural models for CCG parsing can be seen as a reflection of the improvements seen in NLP more generally. The report provides a minimal introduction to CCG and CCG parsing, with many pointers to the relevant literature. It then describes the CCG supertagging problem, and some recent work from Tian et al. (2020) which applies Transformer-based models to supertagging with great effect. I use this existing model to develop a CCG multitagger, which can serve as a front-end to an existing CCG parser. Simply using this new multitagger provides substantial gains in parsing accuracy. I then show how a Transformer-based model from the parsing literature can be combined with the grammar-based CCG parser, setting a new state-of-the-art for the CCGbank parsing task of almost 93% F-score for labelled dependencies, with complete sentence accuracies of over 50%. 1 ar X iv :2 10 9. 10 04 4v 1 [ cs .C L ] 2 1 Se p 20 21

[1]  Stephen Clark,et al.  Expected F-Measure Training for Shift-Reduce Parsing with Recurrent Neural Networks , 2016, HLT-NAACL.

[2]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[3]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[4]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Stephen Clark,et al.  Shift-Reduce CCG Parsing with a Dependency Model , 2014, ACL.

[7]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[8]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[9]  Aravind K. Joshi,et al.  An Introduction to Tree Adjoining Grammar , 1987 .

[10]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[11]  David J. Weir,et al.  The convergence of mildly context-sensitive grammar formalisms , 1990 .

[12]  Stephen Clark,et al.  Shift-Reduce CCG Parsing , 2011, ACL.

[13]  Luke S. Zettlemoyer,et al.  LSTM CCG Parsing , 2016, NAACL.

[14]  Mark Steedman,et al.  CCG Parsing Algorithm with Incremental Tree Rotation , 2019, NAACL-HLT.

[15]  S. Clark,et al.  The Java Version of the C & C Parser Version 0 . 95 , 2015 .

[16]  Ashish Vaswani,et al.  Supertagging With LSTMs , 2016, NAACL.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Mark Steedman,et al.  Max-Margin Incremental CCG Parsing , 2020, ACL.

[19]  Baobao Chang,et al.  Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[20]  Stephen Clark,et al.  Adapting a Lexicalized-Grammar Parser to Contrasting Domains , 2008, EMNLP.

[21]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[22]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[23]  Jonathan K. Kummerfeld,et al.  Large-Scale Syntactic Processing : Parsing the Web Final Report of the 2009 JHU CLSP Workshop , 2009 .

[24]  Dimitri Kartsaklis,et al.  A CCG-Based Version of the DisCoCat Framework , 2021, SEMSPACE.

[25]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[26]  Stephen Clark,et al.  CCG Supertagging with a Recurrent Neural Network , 2015, ACL.

[27]  David J. Weir A Geometric Hierarchy Beyond Context-Free Languages , 1992, Theor. Comput. Sci..

[28]  Mark Steedman,et al.  Wide-Coverage Semantic Representations from a CCG Parser , 2004, COLING.

[29]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[30]  Giorgio Satta,et al.  Lexicalization and Generative Power in CCG , 2015, CL.

[31]  Luke S. Zettlemoyer,et al.  Global Neural CCG Parsing with Optimality Guarantees , 2016, EMNLP.

[32]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[33]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[34]  Jungo Kasai,et al.  End-to-End Graph-Based TAG Parsing with Neural Networks , 2018, NAACL.

[35]  Mark Steedman,et al.  Unbounded Dependency Recovery for Parser Evaluation , 2009, EMNLP.

[36]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[37]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[38]  Jason Baldridge,et al.  Lexically specified derivational control in combinatory categorial grammar , 2002 .

[39]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[40]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[41]  Gerald Penn,et al.  Supertagging with CCG primitives , 2020, REPL4NLP.

[42]  Mirella Lapata,et al.  Universal Discourse Representation Structure Parsing , 2021, CL.

[43]  Yuji Matsumoto,et al.  A* CCG Parsing with a Supertag and Dependency Factored Model , 2017, ACL.

[44]  Shalom Lappin Deep Learning and Linguistic Representation , 2021 .

[45]  Yoav Artzi,et al.  Learning Compact Lexicons for CCG Semantic Parsing , 2014, EMNLP.

[46]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[47]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[48]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[49]  James R. Curran,et al.  Perceptron Training for a Wide-Coverage Lexicalized-Grammar Parser , 2007, ACL 2007.

[50]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[51]  Michael Moortgat,et al.  Categorial Type Logics , 1997, Handbook of Logic and Language.

[52]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[53]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[54]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[55]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[56]  Luke S. Zettlemoyer,et al.  Broad-coverage CCG Semantic Parsing with AMR , 2015, EMNLP.

[57]  Mark Steedman,et al.  Surface structure and interpretation , 1996, Linguistic inquiry.

[58]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[59]  Dan Klein,et al.  Multilingual Constituency Parsing with Self-Attention and Pre-Training , 2018, ACL.

[60]  James R. Curran,et al.  Faster Parsing by Supertagger Adaptation , 2010, ACL.

[61]  Wenduan Xu,et al.  LSTM Shift-Reduce CCG Parsing , 2016, EMNLP.

[62]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[63]  Fei Xia,et al.  Supertagging Combinatory Categorial Grammar with Attentive Graph Convolutional Networks , 2020, EMNLP.

[64]  Giorgio Satta,et al.  On the Complexity of CCG Parsing , 2017, CL.

[65]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[66]  Dimitri Kartsaklis,et al.  QNLP in Practice: Running Compositional Models of Meaning on a Quantum Computer , 2021, ArXiv.

[67]  Y. Bar-Hillel A Quasi-Arithmetical Notation for Syntactic Description , 1953 .

[68]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[69]  David J. Weir,et al.  Parsing Some Constrained Grammar Formalisms , 1993, Comput. Linguistics.

[70]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[71]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[72]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[73]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[74]  Gerald Penn,et al.  Accurate Context-Free Parsing with Combinatory Categorial Grammar , 2010, ACL.

[75]  Stephen Clark,et al.  Evaluating a Wide-Coverage CCG Parser , 2013 .