Webspace Retrieval Performance Experiment

Finding relevant information using search engines that index large portions of the World-Wide Web is often a frustrating task. Due to the diversity of the information available, those search engines will have to rely on techniques, developed in the eld of information retrieval (IR). When focusing on more limited domains of the Internet, large collections of documents can be found, having a highly structured and multimedia character. Furthermore, it can be assumed that the content is more related. This allows more precise and advanced query formulation techniques to be used for the Web, as commonly used within a database environment. The Webspace Method focuses on such document collections, and o ers an approach for modelling and searching large collections of documents, based on a conceptual schema. The main focus in this article is the evaluation of a retrieval performance experiment, carried out to examine the advances of the webspace search engine, compared to a standard search engine using a widely accepted IR model. A mayor improvement in retrieval performance, measured in terms of recall and precision, up to a factor two, can be achieved when searching document collections, using the Webspace Method.

[1]  N. Fuhr An Extension of XQL for Information Retrieval , 2000 .

[2]  Peter M. G. Apers,et al.  Using Webspaces to Model Document Collections on the Web , 2000, ER.

[3]  Donald D. Chamberlin,et al.  XQuery: a query language for XML , 2003, SIGMOD '03.

[4]  Peter M. G. Apers,et al.  Searching Documents on the Intranet , 1999, WOWS.

[5]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[6]  Paolo Merialdo,et al.  Araneus in the Era of XML , 1999, IEEE Data Eng. Bull..

[7]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[9]  Arjen P. de Vries,et al.  The design and implementation of an infrastructure for multimedia digital libraries , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[10]  Letizia Tanca,et al.  XML-GL: A Graphical Language for Querying and Restructuring XML Documents , 1999, SEBD.

[11]  Djoerd Hiemstra,et al.  Predicting the cost-quality trade-off for information retrieval queries: facilitating database design and query optimization , 2001, CIKM '01.

[12]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[13]  Arjen P. de Vries,et al.  Experiences with IR TOP N Optimization in a Main Memory DBMS: Applying 'the Database Approach' in New Domains , 2001, BNCOD.

[14]  A. N. Wilschut,et al.  On the integration of IR and Databases , 1999 .

[15]  Alberto O. Mendelzon,et al.  WebOQL: Exploiting Document Structure in Web Queries , 1998 .

[16]  Arjen P. de Vries,et al.  Content and multimedia database management systems , 1999 .