A Database Approach to Content-based XML Retrieval

This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Ophir Frieder,et al.  Integrating Structured Data and Text: A Relational Approach , 1997, J. Am. Soc. Inf. Sci..

[5]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[6]  Arjen P. de Vries,et al.  Content and multimedia database management systems , 1999 .

[7]  Jinxi Xu,et al.  Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[8]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[9]  C. Paice Soft evaluation of Boolean search queries in information retrieval systems , 1984 .

[10]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[11]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[12]  James P. Callan,et al.  Language Models and Structured Document Retrieval , 2002, INEX Workshop.

[13]  Henk Ernst Blok Database Optimization Aspects for Information Retrieval , 2002 .

[14]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[15]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[16]  Edward A. Fox,et al.  Research Contributions , 2014 .

[17]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[18]  Roelof van Zwol Modelling and searching web-based document collections , 2002 .

[19]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[20]  Daniela Florescu,et al.  A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database , 1999 .

[21]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[22]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[23]  Menzo Windhouwer,et al.  Efficient Relational Storage and Retrieval of XML Documents , 2000, WebDB.

[24]  Djoerd Hiemstra,et al.  Disambiguation Strategies for Cross-Language Information Retrieval , 1999, ECDL.

[25]  Grigoris Antoniou,et al.  Nonmonotonic reasoning , 1997 .

[26]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[27]  Arjen P. de Vries,et al.  Moa: extensibility and efficiency in querying nested data , 2002 .