Relevance feedback between hypertext and Semantic Web search: Frameworks and evaluation

We investigate the possibility of using Semantic Web data to improve hypertext Web search. In particular, we use relevance feedback to create a 'virtuous cycle' between data gathered from the Semantic Web of Linked Data and web-pages gathered from the hypertext Web. Previous approaches have generally considered the searching over the Semantic Web and hypertext Web to be entirely disparate, indexing, and searching over different domains. While relevance feedback has traditionally improved information retrieval performance, relevance feedback is normally used to improve rankings over a single data-set. Our novel approach is to use relevance feedback from hypertext Web results to improve Semantic Web search, and results from the Semantic Web to improve the retrieval of hypertext Web data. In both cases, an evaluation is performed based on certain kinds of informational queries (abstract concepts, people, and places) selected from a real-life query log and checked by human judges. We evaluate our work over a wide range of algorithms and options, and show it improves baseline performance on these queries for deployed systems as well, such as the Semantic Web Search engine FALCON-S and Yahoo! Web search. We further show that the use of Semantic Web inference seems to hurt performance, while the pseudo-relevance feedback increases performance in both cases, although not as much as actual relevance feedback. Lastly, our evaluation is the first rigorous 'Cranfield' evaluation of Semantic Web search.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  Peter Mika,et al.  Entity Search Evaluation over Structured Web Data , 2011 .

[3]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[4]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[5]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[6]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[7]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[8]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[9]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[10]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[11]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[12]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[13]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[14]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 14: Enterprise Track , 2005, TREC.

[15]  Steve Renals,et al.  Proceedings of the Ninth Text REtrieval Conference , 2001 .

[16]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[17]  Ziqi Zhang,et al.  Dynamic iterative ontology learning , 2007 .

[18]  Ricardo Baeza-Yates From Capturing Semantics to Semantic Search: A Virtuous Cycle , 2008, ESWC.

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Robert Wing Pong Luk,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[21]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[22]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[23]  Yuzhong Qu,et al.  Falcons: searching and browsing entities on the semantic web , 2008, WWW.

[24]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[25]  Christoph Mangold,et al.  A survey and classification of semantic search approaches , 2007, Int. J. Metadata Semant. Ontologies.

[26]  Donald H. Kraft,et al.  SIGIR 2001 : proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in information Retrieval : New Orleans, Louisiana, USA, September 9-13, 2001 , 2001 .

[27]  Roi Blanco,et al.  Repeatable and reliable search system evaluation using crowdsourcing , 2011, SIGIR.

[28]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[29]  James Allan,et al.  Relevance models for topic detection and tracking , 2002 .

[30]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[31]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[32]  Harry Halpin A Query-Driven Characterization of Linked Data , 2009, LDOW.

[33]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[34]  Tobias Schwarz,et al.  In defense of ambiguity. , 2007, Veterinary radiology & ultrasound : the official journal of the American College of Veterinary Radiology and the International Veterinary Radiology Association.