Improving Domain Independent Question Parsing with Synthetic Treebanks

Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.

[1]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[2]  Nizar Habash,et al.  LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual , 2013, ArXiv.

[3]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[4]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[5]  Evelina Andersson,et al.  Joint Evaluation of Morphological Segmentation and Syntactic Parsing , 2012, ACL.

[6]  Wouter Weerkamp,et al.  What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation , 2015, ACL.

[7]  Iria Gayo Question Parsing for QA in Spanish , 2011, RANLP Student Research Workshop.

[8]  Nizar Habash,et al.  An Arabic Dependency Treebank in the Travel Domain , 2018, ArXiv.

[9]  Marie Candito,et al.  Hard Time Parsing Questions: Building a QuestionBank for French , 2016, LREC.

[10]  Nizar Habash,et al.  A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality , 2011, ACL.

[11]  Nizar Habash,et al.  Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features , 2013, CL.

[12]  Nizar Habash,et al.  CATiB: The Columbia Arabic Treebank , 2009, ACL.

[13]  Yoshua Bengio,et al.  Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus , 2016, ACL.

[14]  Nizar Habash,et al.  Universal Dependencies for Arabic , 2017, WANLP@EACL.

[15]  Brett Browning,et al.  Dialogue patterns of an Arabic robot receptionist , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[16]  Philipp Koehn,et al.  Analysing the Effect of Out-of-Domain Data on SMT Systems , 2012, WMT@NAACL-HLT.

[17]  Noah A. Smith,et al.  Good Question! Statistical Ranking for Question Generation , 2010, NAACL.

[18]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[19]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[20]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[21]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[22]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.

[23]  Jun'ichi Tsujii,et al.  Exploring Difficulties in Parsing Imperatives and Questions , 2011, IJCNLP.

[24]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[25]  Satoshi Sekine,et al.  The Domain Dependence of Parsing , 1997, ANLP.

[26]  Xifeng Yan,et al.  Cross-domain Semantic Parsing via Paraphrasing , 2017, EMNLP.

[27]  Gilles Bernard,et al.  A Survey of Syntactic Parsers of Arabic Language , 2016, BDAW '16.

[28]  Sadid A. Hasan,et al.  Automation of Question Generation From Sentences , 2011 .

[29]  Slav Petrov,et al.  Uptraining for Accurate Deterministic Question Parsing , 2010, EMNLP.

[30]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.