Topic Difficulty Prediction in Entity Ranking

Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking.

[1]  Jintao Li,et al.  Query Performance Prediction for Information Retrieval Based on Covering Topic Score , 2008, Journal of Computer Science and Technology.

[2]  Alistair Moffat,et al.  Score standardization for inter-collection comparison of retrieval systems , 2008, SIGIR '08.

[3]  Josiane Mothe,et al.  Linguistic features to predict query difficulty , 2005, SIGIR 2005.

[4]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[5]  James A. Thom,et al.  Exploiting Locality of Wikipedia Links in Entity Ranking , 2008, ECIR.

[6]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[7]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[8]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[9]  Elad Yom-Tov,et al.  SIGIR workshop report: predicting query difficulty - methods and applications , 2005, SIGF.

[10]  Ellen M. Voorhees,et al.  The Twelfth Text Retrieval Conference, TREC 2003 , 2004 .

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[17]  K. Kwok,et al.  An Attempt to Identify Weakest and Strongest Queries , 2005 .

[18]  Josiane Mothe,et al.  Linguistic features to predict query difficulty - a case study on previous TREC campaigns , 2005 .

[19]  James A. Thom,et al.  Use of Wikipedia Categories in Entity Ranking , 2007, ArXiv.

[20]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[21]  Stephen E. Robertson,et al.  Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Elad Yom-Tov,et al.  Juru at TREC 2004: Experiments with Prediction of Query Difficulty , 2004, TREC.

[24]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[25]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[26]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[27]  Jens Grivolla,et al.  Automatic Classification of Queries by Expected Retrieval Performance , 2005 .

[28]  Gianluca Demartini,et al.  Overview of the INEX 2008 Entity Ranking Track , 2009, INEX.

[29]  Ellen M. Voorhees,et al.  The TREC robust retrieval track , 2005, SIGF.

[30]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[31]  Stefano Mizzaro,et al.  The Good, the Bad, the Difficult, and the Easy: Something Wrong with Information Retrieval Evaluation? , 2008, ECIR.