Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation

We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees. Our solution is conceptually simple, and not relying on a specific parser architecture, making it applicable to the current best-performing parsers. We conduct extensive parsing experiments with detailed discussion; on top of existing benchmark datasets on (1) biomedical texts and (2) question sentences, we create experimental datasets of (3) speech conversation and (4) math problems. When applied to the proposed method, an off-the-shelf CCG parser shows significant performance gains, improving from 90.7% to 96.6% on speech conversation, and from 88.5% to 96.8% on math problems.

[1]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mark Steedman,et al.  CCG Parsing Algorithm with Incremental Tree Rotation , 2019, NAACL-HLT.

[3]  Mark Johnson,et al.  Joint Incremental Disfluency Detection and Dependency Parsing , 2014, TACL.

[4]  Luo Si,et al.  Supervised Treebank Conversion: Data and Approaches , 2018, ACL.

[5]  Mark Hopkins,et al.  Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples , 2018, ACL.

[6]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[7]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[8]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[9]  Dan Klein,et al.  Robust Conversion of CCG Derivations to Phrase Structure Trees , 2012, ACL.

[10]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[11]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[12]  Pascual Martínez-Gómez,et al.  On-demand Injection of Lexical Knowledge for Recognising Textual Entailment , 2017, EACL.

[13]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[14]  Mark Steedman,et al.  A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.

[15]  Lasha Abzianidze,et al.  LangPro: Natural Language Theorem Prover , 2017, EMNLP.

[16]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[17]  Yuji Matsumoto,et al.  A* CCG Parsing with a Supertag and Dependency Factored Model , 2017, ACL.

[18]  Yue Zhang,et al.  Active Learning for Dependency Parsing with Partial Annotation , 2016, ACL.

[19]  Oren Etzioni,et al.  Solving Geometry Problems: Combining Text and Diagram Interpretation , 2015, EMNLP.

[20]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[21]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[22]  Luke S. Zettlemoyer,et al.  LSTM CCG Parsing , 2016, NAACL.

[23]  Mark Steedman,et al.  Using CCG categories to improve Hindi dependency parsing , 2013, ACL.

[24]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[25]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[26]  Stephen Clark,et al.  Adapting a Lexicalized-Grammar Parser to Contrasting Domains , 2008, EMNLP.

[27]  Alexis Nasr,et al.  Active Learning for Dependency Parsing Using Partially Annotated Sentences , 2011, IWPT.

[28]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[29]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[30]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[31]  Hirokazu Anai,et al.  Semantic Parsing of Pre-university Math Problems , 2017, ACL.

[32]  Cristina Bosco,et al.  Converting a dependency treebank to a categorial grammar treebank for Italian , 2009 .

[33]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[34]  Mark Steedman,et al.  Improved CCG Parsing with Semi-supervised Supertagging , 2014, TACL.

[35]  Luke S. Zettlemoyer,et al.  Global Neural CCG Parsing with Optimality Guarantees , 2016, EMNLP.

[36]  Mark Steedman,et al.  Parser Adaptation to the Biomedical Domain without Re-Training , 2015, Louhi@EMNLP.

[37]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.