论文信息 - The First Wikipedia Questions and Factoid Answers Corpus in the Thai Language

The First Wikipedia Questions and Factoid Answers Corpus in the Thai Language

This article introduces a Thai questions-answers corpus for a question-answering task which was extracted from Thai Wikipedia which was downloaded on 17 December 2017. The answers comprise 5,000 annotated factoids. The corresponding questions are exact phrases/sentences that contain the answer, but are replaced by a question word, or synthetic questions acquired from phrases and/or sentences on the wiki page. A question must contain only one of a set of 7 specific question words and a complex question must be avoided. Fifteen annotators used an annotation system specifically designed for this task. Acceptance, rejection, and revision processes were monitored by a language specialist. The final set was divided into 4,000 pairs for a training set and 1,000 pairs for a validation set. A baseline evaluation was conducted and an F1 score of 27.25 was obtained from document readers and 71.24 from document retrievals.

Pornpimon Palingoon | Kanokorn Trakultaweekoon | Anocha Rugchatjaroen | Santipong Thaiprayoon

[1] Yi Yang,et al. WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[2] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[3] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[4] K. Siegbahn. Electron spectroscopy , 2019, Nature.

[5] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.