论文信息 - Project PIAF: Building a Native French Question-Answering Dataset

Project PIAF: Building a Native French Question-Answering Dataset

Motivated by the lack of data for non-English languages, in particular for the evaluation of downstream tasks such as Question Answering, we present a participatory effort to collect a native French Question Answering Dataset. Furthermore, we describe and publicly release the annotation tool developed for our collection effort, along with the data obtained and preliminary baselines.

[1] Wentao Ma,et al. A Span-Extraction Dataset for Chinese Machine Reading Comprehension , 2019, EMNLP-IJCNLP.

[2] Yi Yang,et al. WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[3] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[4] Han Hu,et al. XCMRC: Evaluating Cross-lingual Machine Reading Comprehension , 2019, NLPCC.

[5] Sampo Pyysalo,et al. brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[6] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[7] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[8] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[9] Bowen Zhou,et al. Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[11] Michael S. Bernstein,et al. Daemo: A Self-Governed Crowdsourcing Marketplace , 2015, UIST.

[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[14] Seung-won Hwang,et al. Semi-supervised Training Data Generation for Multilingual Question Answering , 2018, LREC.

[15] Iryna Gurevych,et al. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations , 2013, ACL.

[16] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[17] Laurent Romary,et al. CamemBERT: a Tasty French Language Model , 2019, ACL.

[18] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[19] Yoav Goldberg,et al. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.