Construction of a Test Collection for the Focussed Retrieval of Structured Documents

In this paper, we examine the methodological issues involved in constructing test collections of structured documents and obtaining best entry points for the evaluation of the focussed retrieval of document components. We describe a pilot test of the proposed test collection construction methodology performed on a document collection of Shakespeare plays. In our analysis, we examine the effect of query complexity and type on overall query difficulty, the use of multiple relevance judges for each query, the problem of obtaining exhaustive relevance assessments from participants, and the method of eliciting relevance assessments and best entry points. Our findings indicate that the methodology is indeed feasible in this small-scale context, and merits further investigation.

[1]  Mounia Lalmas,et al.  A Dempster-Shafer indexing for the focused retrieval of a hierarchically structured document space: Implementation and experiments on a web museum collection , 2000, RIAO.

[2]  Mark E. Frisse Searching for Information in a Hypertext Medical Handbook , 1987, Hypertext.

[3]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[4]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[5]  Forbes J. Burkowski Retrieval activities in a database consisting of heterogeneous collections of structured text , 1992, SIGIR '92.

[6]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[7]  Gabriella Kazai,et al.  A Model for the Representation and Focussed Retrieval of Structured Documents Based on Fuzzy Aggregation , 2001, SPIRE.

[8]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[9]  Helen R. Tibbo,et al.  The Cystic Fibrosis Database: Content and Research Opportunities. , 1991 .

[10]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[11]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[12]  Joseph W. Janes,et al.  Other People's Judgments: A Comparison of Users' and Others' Judgments of Document Relevance, Topicality, and Utility , 1994, J. Am. Soc. Inf. Sci..

[13]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[14]  M. Lalmas,et al.  A dempster-shafer indeing for structured document retrieval: implementation and experiments on a web museum collection , 1999 .

[15]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.

[16]  Gabriella Kazai,et al.  The Accessibility Dimension for Structured Document Retrieval , 2002, ECIR.

[17]  Jane Reid,et al.  User Behaviour in the Context of Structured Documents , 2003, ECIR.

[18]  Thomas Roelleke POOL: probabilistic object oriented logical representation and retrieval of complex objects: a model for hypermedia retrieval , 1999 .

[19]  Donna K. Harman,et al.  The TREC Conferences , 1997, HIM.

[20]  Berthier A. Ribeiro-Neto,et al.  Link-based and content-based evidential information in a belief network model , 2000, SIGIR '00.

[21]  Sung-Hyon Myaeng,et al.  A flexible model for retrieval of SGML documents , 1998, SIGIR '98.

[22]  Evangelos Kotsakis,et al.  Structured information retrieval in XML documents , 2002, SAC '02.

[23]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[24]  Nicholas Kushmerick,et al.  Expressive retrieval from XML documents , 2001, SIGIR '01.

[25]  Jean-Pierre Chevallet,et al.  Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages , 2001, WebDyn@ICDT.