Neural Learning for Question Answering in Italian

The recent breakthroughs in the field of deep learning have lead to state-of-the-art results in several NLP tasks such as Question Answering (QA). Nevertheless, the training requirements in cross-linguistic settings are not satisfied: the datasets suitable for training of question answering systems for non English languages are often not available, which represents a significant barrier for most neural methods. This paper explores the possibility of acquiring a large scale although lower quality dataset for an open-domain factoid questions answering system in Italian. It consists of more than 60 thousands question-answer pairs and was used to train a system able to answer factoid questions against the Italian Wikipedia. The paper describes the dataset and the experiments, inspired by an equivalent counterpart for English. These show that results achievable for Italian are worse, even though they are already applicable to concrete QA tasks.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[3]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[4]  Petr Baudis,et al.  Modeling of the Question Answering Task in the YodaQA System , 2015, CLEF.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[7]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[8]  Sanda M. Harabagiu,et al.  FALCON: Boosting Knowledge for Answer Engines , 2000, TREC.

[9]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[12]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[13]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[14]  Ming-Wei Chang,et al.  Open Domain Question Answering via Semantic Enrichment , 2015, WWW.

[15]  Annalina Caputo,et al.  Overview of the EVALITA 2016 Question Answering for Frequently Asked Questions (QA4FAQ) Task , 2016, CLiC-it/EVALITA.