Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering

Abstract Question Answering (QA) systems based on Information Retrieval return precise answers to natural language questions, extracting relevant sentences from document collections. However, questions and sentences cannot be aligned terminologically, generating errors in the sentence retrieval. In order to augment the effectiveness in retrieving relevant sentences from documents, this paper proposes a hybrid Query Expansion (QE) approach, based on lexical resources and word embeddings, for QA systems. In detail, synonyms and hypernyms of relevant terms occurring in the question are first extracted from MultiWordNet and, then, contextualized to the document collection used in the QA system. Finally, the resulting set is ranked and filtered on the basis of wording and sense of the question, by employing a semantic similarity metric built on the top of a Word2Vec model. This latter is locally trained on an extended corpus pertaining the same topic of the documents used in the QA system. This QE approach is implemented into an existing QA system and experimentally evaluated, with respect to different possible configurations and selected baselines, for the Italian language and in the Cultural Heritage domain, assessing its effectiveness in retrieving sentences containing proper answers to questions belonging to four different categories.

[1]  Aditi Sharan,et al.  Term co-occurrence and context window-based combined approach for query expansion with the semantic notion of terms , 2017, Int. J. Web Sci..

[2]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[3]  Anselmo Peñas,et al.  A study about the future evaluation of Question-Answering systems , 2017, Knowl. Based Syst..

[4]  Massimo Esposito,et al.  Question Classification by Convolutional Neural Networks Embodying Subword Information , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[5]  Francis C. Fernández-Reyes,et al.  A Prospect-Guided global query expansion strategy using word embeddings , 2018, Inf. Process. Manag..

[6]  Jean-Pierre Chevallet,et al.  A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information , 2016, ECIR.

[7]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[8]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[9]  Suresh Manandhar,et al.  Grounding proposition stores for question answering over linked data , 2017, Knowl. Based Syst..

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Mohammed Belkhatir,et al.  Natural language technology and query expansion: issues, state-of-the-art and perspectives , 2011, Journal of Intelligent Information Systems.

[12]  Hui Fang,et al.  A Re-examination of Query Expansion Using Lexical Resources , 2008, ACL.

[13]  Qingyao Wu,et al.  Leveraging question target word features through semantic relation expansion for answer type classification , 2017, Knowl. Based Syst..

[14]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[15]  Koji Zettsu,et al.  Spatio‐temporal pseudo relevance feedback for scientific data retrieval , 2017 .

[16]  Zhoujun Li,et al.  Response selection from unstructured documents for human-computer conversation systems , 2018, Knowl. Based Syst..

[17]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[18]  Yue-Shi Lee,et al.  A support vector machine-based context-ranking model for question answering , 2013, Inf. Sci..

[19]  Peter Willett,et al.  The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems , 1991 .

[20]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[21]  Mandar Mitra,et al.  Improving query expansion using WordNet , 2013, J. Assoc. Inf. Sci. Technol..

[22]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[23]  Chun-Yu Lin,et al.  An entropy-based query expansion approach for learning researchers' dynamic information needs , 2013, Knowl. Based Syst..

[24]  Masoud Rahgozar,et al.  A query term re-weighting approach using document similarity , 2016, Inf. Process. Manag..

[25]  Liqiang Nie,et al.  Exploring heterogeneous features for query-focused summarization of categorized community answers , 2016, Inf. Sci..

[26]  Tat-Seng Chua,et al.  Discovering high quality answers in community question answering archives using a hierarchy of classifiers , 2014, Inf. Sci..

[27]  Utpal Garain,et al.  Using Word Embeddings for Automatic Query Expansion , 2016, ArXiv.

[28]  Marie-Francine Moens,et al.  A survey on question answering technology from an information retrieval perspective , 2011, Inf. Sci..

[29]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[30]  Massimo Esposito,et al.  Query Expansion Based on WordNet and Word2vec for Italian Question Answering Systems , 2017, 3PGCIC.

[31]  Sung-Hyon Myaeng,et al.  Semantic passage segmentation based on sentence topics for question answering , 2007, Inf. Sci..

[32]  Zhoujun Li,et al.  Named entity disambiguation for questions in community question answering , 2017, Knowl. Based Syst..

[33]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[34]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[35]  Yogesh Gupta,et al.  A novel Fuzzy-PSO term weighting automatic query expansion approach using combined semantic filtering , 2017, Knowl. Based Syst..

[36]  Beixing Deng,et al.  Concept Based Query Expansion Using WordNet , 2009, 2009 International e-Conference on Advanced Science and Technology.

[37]  Maryam Habibi,et al.  Question answering in conversations: Query refinement using contextual and semantic information , 2016, Data Knowl. Eng..

[38]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[39]  Giuseppe De Pietro,et al.  An Effective Corpus-Based Question Answering Pipeline for Italian , 2017, IIMSS.

[40]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[41]  Allan Hanbury,et al.  Exploration of a Threshold for Similarity Based on Uncertainty in Word Embedding , 2017, ECIR.

[42]  Nianjun Liu,et al.  A latent semantic indexing and WordNet based information retrieval model for digital forensics , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[43]  Masoud Rahgozar,et al.  A Knowledge-Based Question Answering System for B2C eCommerce , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[44]  Alvaro Barreiro,et al.  Score distributions for Pseudo Relevance Feedback , 2014, Inf. Sci..

[45]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[46]  Dietrich Klakow,et al.  Bridging the vocabulary gap between questions and answer sentences , 2015, Inf. Process. Manag..