Integration of IR into an XML Database

Structure matching has been the focus and strength of standard XML querying. However, textual content is still an essential component of XML data. It is therefore important to extend the standard XML database engine to allow for “Information Retrieval” style queries, namely, “keyword” based retrieval and “result ranking”. In this paper, we describe our effort in integrating information retrieval techniques into the Timber XML database system being developed at the University of Michigan, and our participation in the INitiative for the Evaluation of XML Retrieval (INEX).

[1]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[2]  C. Fellbaum An Electronic Lexical Database , 1998 .

[3]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[4]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[5]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[7]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Rakesh Agrawal,et al.  A framework for expressing and combining preferences , 2000, SIGMOD '00.

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[13]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[14]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[16]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[17]  Torsten Schlieder,et al.  Result Ranking for Structured Queries against XML Documents , 2000, DELOS.

[18]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[19]  N. Fuhr PAN-Uncovering Plagiarism , Authorship , and Social Software Misuse ImageCLEF 2013-Cross Language Image Annotation and Retrieval INEX-INitiative for the Evaluation of XML retrieval , 2002 .

[20]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.