SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis

SberQuAD -- a large scale analog of Stanford SQuAD in the Russian language - is a valuable resource that has not been properly presented to the scientific community. We fill this gap by providing a description, a thorough analysis, and baseline experimental results.

[1]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[2]  Akiko Aizawa,et al.  Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability , 2017, ACL.

[3]  Xinyan Xiao,et al.  DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications , 2017, QA@ACL.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[6]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[7]  Yuan Sun,et al.  A Hybrid Network Model for Tibetan Question Answering , 2019, IEEE Access.

[8]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[9]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[10]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[11]  Valentin Jijkoun,et al.  Overview of the CLEF 2007 Multilingual Question Answering Track , 2007, CLEF.

[12]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[13]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[14]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[15]  John M. Prager,et al.  Open-Domain Question-Answering , 2007, Found. Trends Inf. Retr..

[16]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[17]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[18]  Peng Li,et al.  Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering , 2016, ArXiv.

[19]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[20]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[21]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[22]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[23]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[24]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[25]  Marc-Antoine Rondeau,et al.  Systematic Error Analysis of the Stanford Question Answering Dataset , 2018, QA@ACL.

[26]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[27]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[28]  Preslav Nakov,et al.  Beyond English-Only Reading Comprehension: Experiments in Zero-shot Multilingual Transfer for Bulgarian , 2019, RANLP.

[29]  An Yang,et al.  Machine Reading Comprehension: a Literature Review , 2019, ArXiv.

[30]  Varvara Logacheva,et al.  DeepPavlov: Open-Source Library for Dialogue Systems , 2018, ACL.

[31]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[32]  Konstantin Vorontsov,et al.  Three-stage question answering system with sentence ranking , 2019 .

[33]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[34]  Eric Nyberg,et al.  Comparative Analysis of Neural QA models on SQuAD , 2018, QA@ACL.

[35]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[36]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[37]  Martin d'Hoffschmidt,et al.  FQuAD: French Question Answering Dataset , 2020, FINDINGS.

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.