Hard Time Parsing Questions: Building a QuestionBank for French

We present the French Question Bank, a treebank of 2600 questions. We show that classical parsing model performance drop while the inclusion of this data set is highly beneficial without harming the parsing of non-question data. when facing out-of-domain data with strong structural divergences. Two thirds being aligned with the English QuestionBank (Judge et al., 2006) and being freely available, this treebank will prove useful to build robust NLP systems.

[1]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[2]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[3]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[4]  Anne Abeillé,et al.  Enriching a French Treebank , 2004, LREC.

[5]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[7]  Benoît Sagot,et al.  The French Social Media Bank: a Treebank of Noisy User Generated Content , 2012, COLING.

[8]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[9]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[10]  Marie Candito,et al.  Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical (The Sequoia Corpus : Syntactic Annotation and Use for a Parser Lexical Domain Adaptation Method) [in French] , 2012, JEP/TALN/RECITAL.

[11]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[12]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[13]  David McClosky,et al.  Parsing Paraphrases with Joint Inference , 2015, ACL.

[14]  Josef van Genabith,et al.  Treebank-Based Acquisition of LFG Parsing Resources for French , 2008, LREC.

[15]  Marie Candito,et al.  Improving generative statistical parsing with semi-supervised word clustering , 2009, IWPT.

[16]  Slav Petrov,et al.  Uptraining for Accurate Deterministic Question Parsing , 2010, EMNLP.

[17]  Maarten de Rijke,et al.  Creating the DISEQuA Corpus: a Test Set for Multilingual Question Answering , 2003, CLEF.

[18]  Grzegorz Chrupala,et al.  Towards a machine-learning architecture for lexical functional grammar parsing , 2008 .