Retrieval quality vs. effectiveness of specificity-oriented search in XML collections

Content-only queries in hierarchically structured documents should retrieve the most specific document nodes which are exhaustive to the information need. For this problem, we investigate two methods of augmentation, which both yield high retrieval quality. As retrieval effectiveness, we consider the ratio of retrieval quality and response time; thus, fast approximations to the 'correct' retrieval result may yield higher effectiveness. We present a classification scheme for algorithms addressing this issue, and adopt known algorithms from standard document retrieval for XML retrieval. As a new strategy, we propose incremental-interruptible retrieval, which allows for instant presentation of the top ranking documents. We develop a new algorithm implementing this strategy and evaluate the different methods with the INEX collection.

[1]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[2]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[3]  Norbert Fuhr,et al.  XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.

[4]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[5]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[6]  Norbert Fuhr,et al.  DOLORES: a system for logic-based retrieval of multimedia objects , 1998, SIGIR '98.

[7]  K. Hatano,et al.  Keyword-based XML Portion Retrieval : Experimental Evaluation based on INEX 2003 Relevance Assessments , 2004 .

[8]  N. Fuhr PAN-Uncovering Plagiarism , Authorship , and Social Software Misuse ImageCLEF 2013-Cross Language Image Annotation and Retrieval INEX-INitiative for the Evaluation of XML retrieval , 2002 .

[9]  Pavel Zezula,et al.  Region proximity in metric spaces and its use for approximate similarity search , 2003, TOIS.

[10]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[11]  Stephen E. Robertson,et al.  Evaluating Interactive Systems in TREC , 1996, J. Am. Soc. Inf. Sci..

[12]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[13]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[14]  Gabriella Kazai,et al.  Overview of the Initiative for the Evaluation of XML retrieval (INEX) 2002 , 2002, INEX Workshop.

[15]  Ulrich Pfeifer,et al.  Incremental Processing of Vague Queries in Interactive Retrieval Systems , 1997, HIM.

[16]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.