论文信息 - DARC-IT: a DAtaset for Reading Comprehension in ITalian

DARC-IT: a DAtaset for Reading Comprehension in ITalian

English. In this paper, we present DARCIT, a new reading comprehension dataset for the Italian language aimed at identifying ‘question-worthy’ sentences, i.e. sentences in a text which contain information that is worth asking a question about1. The purpose of the corpus is twofold: to investigate the linguistic profile of questionworthy sentences and to support the development of automatic question generation systems. Italiano. In questo contributo, viene presentato DARC-IT, un nuovo corpus di comprensione scritta per la lingua italiana per l’identificazione delle frasi che si prestano ad essere oggetto di una domanda2. Lo scopo di questo corpus è duplice: studiare il profilo linguistico delle frasi informative e fornire un corpus di addestramento a supporto di un sistema automatico di generazione di domande di

[1] Rebecca J. Passonneau,et al. Wise Crowd Content Assessment and Educational Rubrics , 2016, International Journal of Artificial Intelligence in Education.

[2] Xinya Du,et al. Identifying Where to Focus in Reading Comprehension for Neural Question Generation , 2017, EMNLP.

[3] Ruslan Mitkov,et al. Automatic generation of multiple choice questions using dependency-based semantic relations , 2014, Soft Comput..

[4] Catherine Snow,et al. Reading for Understanding: Toward an R&D Program in Reading Comprehension , 2002 .

[5] Felice Dell'Orletta,et al. Ensemble system for Part-of-Speech tagging , 2009 .

[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[8] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[9] Simonetta Montemagni,et al. READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[10] Felice Dell'Orletta,et al. Accurate Dependency Parsing with a Stacked Multilayer Perceptron , 2009 .