Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback

Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms. The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages.

[1]  Brigitte Grau,et al.  The Question Answering System QALC at LIMSI, Experiments in Using Web and WordNet , 2002, TREC.

[2]  Jörg Tiedemann Integrating Linguistic Knowledge in Passage Retrieval for Question Answering , 2005, HLT/EMNLP.

[3]  James Allan,et al.  Passage Retrieval and Evaluation , 2005 .

[4]  Anne Laurent,et al.  How to Rank Terminology Extracted by Exterlog , 2009, IC3K.

[5]  Jimmy J. Lin An exploration of the principles underlying redundancy-based factoid question answering , 2007, TOIS.

[6]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[7]  Gideon S. Mann,et al.  Analyses for elucidating current question answering technology , 2001, Natural Language Engineering.

[8]  Ricardo Baeza-Yates,et al.  Advanced Topics in Information Retrieval , 2011, The Information Retrieval Series.

[9]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[10]  Xiaoyan Li,et al.  Enhancing Relevance Models with Adaptive Passage Retrieval , 2008, ECIR.

[11]  Benjamin Piwowarski,et al.  Precision recall with user modeling (PRUM): Application to structured information retrieval , 2007, TOIS.

[12]  Fernando Llopis,et al.  Passage Selection to Improve Question Answering , 2002, COLING 2002.

[13]  Ranadhir Ghosh,et al.  A semantic approach to boost passage retrieval effectiveness for question answering , 2006, ACSC.

[14]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[15]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[16]  Jörg Tiedemann,et al.  Using Lexico-Semantic Information for Query Expansion in Passage Retrieval for Question Answering , 2008, COLING 2008.

[17]  Kate Byrne,et al.  Putting Hybrid Cultural Data on the Semantic Web , 2009, J. Digit. Inf..

[18]  Salim Roukos,et al.  IBM's Statistical Question Answering System-TREC 11 , 2001, TREC.

[19]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[20]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[21]  Christof Monz,et al.  From document retrieval to question answering , 2003 .

[22]  Adwait Ratnaparkhi,et al.  IBM's Statistical Question Answering System , 2000, TREC.

[23]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[24]  Sanda M. Harabagiu,et al.  High performance question/answering , 2001, SIGIR '01.

[25]  Patrice Bellot,et al.  Influence de mesures de densité pour la recherche de passages et l'extraction de réponses dans un système de questions-réponses , 2006, CORIA.

[26]  W. Bruce Croft,et al.  A deterministic resampling method using overlapping document clusters for pseudo-relevance feedback , 2013, Inf. Process. Manag..

[27]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[28]  Grace Hui Yang,et al.  The Integration of Lexical Knowledge and External Resources for Question Answering , 2002, TREC.

[29]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[30]  Luis Gravano,et al.  Learning search engine specific query transformations for question answering , 2001, WWW '01.

[31]  Jiewen Wu,et al.  A Study of Ontology-based Query Expansion , 2011 .

[32]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[33]  Jörg Tiedemann,et al.  Simple is Best: Experiments with Different Document Segmentation Strategies for Passage Retrieval , 2008, COLING 2008.

[34]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[35]  Delphine Bernhard,et al.  Query Expansion based on Pseudo Relevance Feedback from Definition Clusters , 2010, COLING.

[36]  Anne Laurent,et al.  Terminology Extraction from Log Files , 2009, DEXA.

[37]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[38]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[39]  Leila Kosseim,et al.  Indexing Low Frequency Information for Answering Complex Questions , 2007, RIAO.

[40]  Daniel Ferr Experiments Adapting an Open-Domain Question Answering System to the Geographical Domain Using Scope-Based Resources , 2006 .

[41]  Hassan Saneifar Locating Information in Heterogeneous log files , 2011 .

[42]  Erik W. Selberg,et al.  Information Retrieval Advances using Relevance Feedback , 1997 .

[43]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[44]  Silviu Guiaşu,et al.  Information theory with applications , 1977 .

[45]  Mathieu Roche,et al.  Text and Web Mining Approaches in Order to Build Specialized Ontologies , 2009, J. Digit. Inf..

[46]  Delphine Tribout,et al.  Morphological Resources for Precise Information Retrieval , 2012, TSD.

[47]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[48]  Leila Kosseim,et al.  Improving the performance of question answering with semantically equivalent answer patterns , 2008, Data Knowl. Eng..

[49]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[50]  Jian-Yun Nie,et al.  Query expansion using term relationships in language models for information retrieval , 2005, CIKM '05.

[51]  ChengXiang Zhai,et al.  Adaptive relevance feedback in information retrieval , 2009, CIKM.

[52]  Paolo Rosso,et al.  Answering questions with an n-gram based passage retrieval engine , 2009, Journal of Intelligent Information Systems.

[53]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[54]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[55]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[56]  Jungyun Seo,et al.  SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP , 2001, TREC.

[57]  Ian Soboro Overview of the TREC 2004 Novelty Track , 2004 .

[58]  Ian Soboroff,et al.  Overview of the TREC 2004 Novelty Track , 2004, TREC.

[59]  Mostafa Keikha,et al.  TEMPER: A Temporal Relevance Feedback Method , 2011, ECIR.

[60]  Suzan Verberne,et al.  Passage Retrieval for Question Answering using Sliding Windows , 2008, COLING 2008.

[61]  Jörg Tiedemann Comparing Document Segmentation Strategies for Passage Retrieval in Question Answering , 2007 .

[62]  John O'Connor,et al.  Retrieval of answer-sentences and answer-figures from papers by text searching , 1975, Inf. Process. Manag..

[63]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.