This paper reports on the RMIT group’s approach to XML retrieval while participating in INEX 2003. We indexed XML documents using Lucy, a compact and fast text search engine designed and written by the Search Engine Group at RMIT University. For each INEX topic, up to 1000 highly ranked documents were then loaded and indexed by eXist, an open source native XML database. A query translator converts the INEX topics into corresponding Lucy and eXist query expressions, respectively. These query expressions may represent traditional information retrieval tasks (unconstrained, CO topics), or may focus on retrieving and ranking specific document components (constrained, CAS topics). With respect to both these expression types, we used eXist to extract final answers (either full documents or document components) from those documents that were judged highly relevant by Lucy. Several extraction strategies were used that differently influenced the ranking order of the final answers. The final INEX results show that our choice for a translation method and an extraction strategy leads to a very effective XML retrieval for the CAS topics. We observed a system limitation for the CO topics resulting in the same or similar choice to have little or no impact on the retrieval performance.
[1]
Ross Wilkinson,et al.
Effective retrieval of structured documents
,
1994,
SIGIR '94.
[2]
Ian H. Witten,et al.
Managing Gigabytes: Compressing and Indexing Documents and Images
,
1999
.
[3]
Masatoshi Yoshikawa,et al.
Determining the Unit of Retrieval Results for XML Documents
,
2002,
INEX Workshop.
[4]
David Hawking,et al.
CSIRO INEX experiments: XML Search using PADRE
,
2002,
INEX Workshop.
[5]
Gabriella Kazai,et al.
Overview of the Initiative for the Evaluation of XML retrieval (INEX) 2002
,
2002,
INEX Workshop.
[6]
Wolfgang Meier,et al.
eXist: An Open Source Native XML Database
,
2002,
Web, Web-Services, and Database Systems.
[7]
James A. Thom,et al.
XML-search Query Language: Needs and Requirements
,
2003,
WWW 2003.