Entity query feature expansion using knowledge base links

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches.

[1]  Andreas Ruttor Parameter Estimation in Potts Models as an Instance of Probabilistic Programming via Imperatively Defined Factor Graphs , 2014 .

[2]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[3]  Gianluca Demartini,et al.  Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.

[4]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[5]  Jaap Kamps,et al.  Overview of the INEX 2013 Linked Data Track , 2013, CLEF.

[6]  Krisztian Balog,et al.  Overview of the TREC 2011 Entity Track , 2011, TREC.

[7]  Jaap Kamps,et al.  Fifth workshop on exploiting semantic annotations in information retrieval: ESAIR''12) , 2012, CIKM '12.

[8]  Andrew Trotman,et al.  Overview of INEX 2007 Link the Wiki Track , 2007, INEX.

[9]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[10]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[11]  Jennifer Chu-Carroll,et al.  Statistical source expansion for question answering , 2011, CIKM '11.

[12]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[13]  Jack G. Conrad,et al.  A system for discovering relationships by feature extraction from text databases , 1994, SIGIR '94.

[14]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[15]  Paul N. Bennett,et al.  Sixth workshop on exploiting semantic annotations in information retrieval (ESAIR'13) , 2013, CIKM.

[16]  ChengXiang Zhai,et al.  Positional relevance model for pseudo-relevance feedback , 2010, SIGIR.

[17]  Thanh Tran,et al.  The first joint international workshop on entity-oriented and semantic search (JIWES) , 2012, SIGF.

[18]  James Allan,et al.  An Exploration of Entity Models, Collective Classification and Relation Description , 2004 .

[19]  Peter Mika,et al.  Metadata Statistics for a Large Web Corpus , 2012, LDOW.

[20]  Peter Mika,et al.  Entity Search Evaluation over Structured Web Data , 2011 .

[21]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[22]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[23]  Jaap Kamps,et al.  Linking wikipedia to the web , 2010, SIGIR.

[24]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[25]  Silviu Cucerzan,et al.  TAC Entity Linking by Performing Full-document Entity Extraction and Disambiguation , 2011, TAC.

[26]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[27]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[28]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[29]  Jaap Kamps,et al.  Report on the third workshop on exploiting semantic annotations in information retrieval (ESAIR) , 2011, SIGF.

[30]  Krisztian Balog,et al.  Overview of the TREC 2010 Entity Track , 2010, TREC.

[31]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[32]  W. Bruce Croft,et al.  Effective query formulation with multiple information sources , 2012, WSDM '12.

[33]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[34]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[35]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[36]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[37]  Laura Dietz,et al.  A neighborhood relevance model for entity linking , 2013, OAIR.

[38]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[39]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.