Why finding entities in Wikipedia is difficult, sometimes

Entity Retrieval (ER)—in comparison to classical search—aims at finding individual entities instead of relevant documents. Finding a list of entities requires therefore techniques different to classical search engines. In this paper, we present a model to describe entities more formally and how an ER system can be build on top of it. We compare different approaches designed for finding entities in Wikipedia and report on results using standard test collections. An analysis of entity-centric queries reveals dif- ferent aspects and problems related to ER and shows limitations of current systems per- forming ER with Wikipedia. It also indicates which approaches are suitable for which kinds of queries.

[1]  Djoerd Hiemstra,et al.  Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah , 2008, INEX.

[2]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[3]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[4]  Kevin Chen-Chuan Chang,et al.  Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web , 2007, CIDR.

[5]  James A. Thom,et al.  Exploiting Locality of Wikipedia Links in Entity Ranking , 2008, ECIR.

[6]  Pasquale Lops,et al.  Combining Learning and Word Sense Disambiguation for Intelligent User Profiling , 2007, IJCAI.

[7]  Ludovic Denoyer,et al.  The Wikipedia XML corpus , 2006, SIGF.

[8]  Gianluca Demartini,et al.  Evaluating Relation Retrieval for Entities and Experts , 2008 .

[9]  Enrico Motta,et al.  Revyu: Linking reviews and ratings into the Web of Data , 2008, J. Web Semant..

[10]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[11]  Peter Bailey,et al.  Overview of the TREC 2007 Enterprise Track , 2007, TREC.

[12]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[13]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[14]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[15]  Elad Yom-Tov,et al.  SIGIR workshop report: predicting query difficulty - methods and applications , 2005, SIGF.

[16]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[17]  Gabriella Kazai,et al.  A general matrix framework for modelling Information Retrieval , 2006, Inf. Process. Manag..

[18]  Fabian M. Suchanek,et al.  ESTER: efficient search on text, entities, and relations , 2007, SIGIR.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Paolo Bouquet,et al.  An Entity Name System (ENS) for the Semantic Web , 2008, ESWC.

[21]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[22]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[23]  Mark B. Sandler,et al.  Automatic Interlinking of Music Datasets on the Semantic Web , 2008, LDOW.

[24]  Themis Palpanas,et al.  Entity Data Management in OKKAM , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[25]  Djoerd Hiemstra,et al.  Efficient XML and Entity Retrieval with PF/Tijah: CWI and University of Twente at INEX'08 , 2008, INEX.

[26]  Giuseppe Attardi,et al.  Semantically Annotated Snapshot of the English Wikipedia , 2008, LREC.

[27]  Jörg Hoffmann,et al.  The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1-5, 2008, Proceedings , 2008, ESWC.

[28]  Alistair Moffat,et al.  Precision-at-ten considered redundant , 2008, SIGIR '08.

[29]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[30]  Hsin-Hsi Chen,et al.  Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison , 2006, AIRS.

[31]  Michael Strube,et al.  Distinguishing between Instances and Classes in the Wikipedia Taxonomy , 2008, ESWC.

[32]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[33]  Wolfgang Nejdl,et al.  Semantically Enhanced Entity Ranking , 2008, WISE.

[34]  Ralf Krestel,et al.  A Model for Ranking Entities and Its Application to Wikipedia , 2008, 2008 Latin American Web Conference.

[35]  James Allan,et al.  Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[36]  Jovan Pehcevski,et al.  Topic Difficulty Prediction in Entity Ranking , 2008, INEX.

[37]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[38]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[39]  Gjergji Kasneci,et al.  YAWN: A Semantically Annotated Wikipedia XML Corpus , 2007, BTW.