论文信息 - Enlargement of the Czech Question-Answering Dataset to SQAD v2.0

Enlargement of the Czech Question-Answering Dataset to SQAD v2.0

In this paper, we present the second version of Czech question-answering dataset called SQAD v2.0 (Simple Question Answering Database). The new version represents a large extension of our original SQAD database. In the current release, the dataset contains nearly 9,000 question-answer pairs completed with manual annotation of question and answer types. All texts in the dataset (the source documents, the question and the respective answer) are provided with complete morphological annotation in plain textual format. We offer detailed statistics of the SQAD v2.0 dataset based on the new QA annotation.

Ales Horák | Marek Medved | Terézia Sulganová

[1] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[2] Jan Pomikálek,et al. Text Tokenisation Using unitok , 2014, RASLAN.

[3] Ales Horák,et al. AQA: Automatic Question Answering System for Czech , 2016, TSD.

[4] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.

[5] Ales Horák,et al. SQAD: Simple Question Answering Database , 2014, RASLAN.