Sentence and Word Embedding Employed in Open Question-Answering

The Automatic Question Answering, or AQA, system is a representative of open domain QA systems, where the answer selection process leans on syntactic and semantic similarities between the question and the answering text snippets. Such approach is specifically oriented to languages with fine grained syntactic and morphologic features that help to guide the correct QA match. In this paper, we present the latest results of the AQA system with new word embedding criteria implementation. All AQA processing steps (question processing, answer selection and answer extraction) are syntax-based with advanced scoring obtained by a combination of several similarity criteria (TF-IDF, tree distance, ...). Adding the word embedding parameters helped to resolve the QA match in cases, where the answer is expressed by semantically near equivalents. We describe the design and implementation of the whole QA process and provide a new evaluation of the AQA system with the word embedding criteria measured with an expanded version of Simple Question-Answering Database, or SQAD, with more than 3000 question-answer pairs extracted from the Czech Wikipedia.

[1]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[4]  Vojtech Kovár,et al.  Czech Morphological Tagset Revisited , 2011, RASLAN.

[5]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  Oren Etzioni Search needs a shake-up , 2011, Nature.

[8]  Ales Horák,et al.  Syntactic Analysis Using Finite Patterns: A New Parsing System for Czech , 2009, LTC.

[9]  Adam Kilgarriff,et al.  The TenTen Corpus Family , 2013 .

[10]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[11]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[12]  Ming-Wei Chang,et al.  Open Domain Question Answering via Semantic Enrichment , 2015, WWW.

[13]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[14]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[15]  Pavel Smerk Fast Morphological Analysis of Czech , 2009, RASLAN.

[16]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[17]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[18]  Ales Horák,et al.  AQA: Automatic Question Answering System for Czech , 2016, TSD.

[19]  Ales Horák,et al.  SQAD: Simple Question Answering Database , 2014, RASLAN.

[20]  Pavel Rychlý,et al.  Building a 70 billion word corpus of English from ClueWeb , 2012, LREC.

[21]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.