Topic Field Selection and Smoothing for XML Retrieval

Information retrieval from XML documents offers an opportunity to go below the document level in search of relevant information, making any element of an XML document a retrievable unit. We consider two dimensions along which we compare this element retrieval task with the traditional document retrieval task. We investigate how different topic representations and language model smoothing approaches affect the performance of the two tasks. We evaluate our ideas against the INEX 2002 XML retrieval test-suite.

[1]  W. John Wilbur,et al.  Non-parametric significance tests of retrieval performance comparisons , 1994, J. Inf. Sci..

[2]  W. Bruce Croft Language models for information retrieval , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM.

[4]  Djoerd Hiemstra,et al.  Twenty-One at TREC-8: using Language Technology for Information Retrieval , 1999, TREC.

[5]  Maarten de Rijke,et al.  XML retrieval: what to retrieve? , 2003, SIGIR '03.

[6]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[7]  Wessel Kraaij,et al.  TNO-UT at TREC-9: How Different are Web Documents? , 2000, TREC.

[8]  Sung-Hyon Myaeng,et al.  A flexible model for retrieval of SGML documents , 1998, SIGIR '98.

[9]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[10]  Gabriella Kazai,et al.  Overview of the Initiative for the Evaluation of XML retrieval (INEX) 2002 , 2002, INEX Workshop.

[11]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[12]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[13]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[14]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[15]  James Allan,et al.  INQUERY at TREC-5 , 1996, TREC.

[16]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[17]  Donna K. Harman,et al.  Overview of the TREC 2002 Novelty Track , 2002, TREC.

[18]  Maarten de Rijke,et al.  Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian , 2001, CLEF.

[19]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.