Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah

CWI and University of Twente used PF/Tijah, a flexible XML retrieval system, to evaluate structured document retrieval, multimedia retrieval, and entity ranking tasks in the context of INEX 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful. For retrieving images in isolation, we found that their associated text is a very good source of evidence in the Wikipedia collection. For the entity ranking task, we used random walks to model multi-step relevance propagation from the articles describing entities to all related entities and further, and obtained promising results.

[1]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[2]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[3]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[4]  Djoerd Hiemstra,et al.  PFTijah: text search in an XML database system , 2006 .

[5]  Azadeh Shakery,et al.  A probabilistic relevance propagation model for hypertext retrieval , 2006, CIKM '06.

[6]  Djoerd Hiemstra,et al.  University of Twente at the TREC 2007 Enterprise Track: Modeling Relevance Propagation for the Expert Search Task , 2007, TREC.

[7]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[10]  Iraklis Varlamis,et al.  BlogRank: ranking weblogs based on connectivity and similarity features , 2006, AAA-IDEA '06.

[11]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[12]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[13]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[14]  Djoerd Hiemstra,et al.  TIJAH: Embracing IR Methods in XML Databases , 2005, Information Retrieval.