XML search: languages, INEX and scoring

The development of approaches to access XML content has generated a wealth of issues in information retrieval (IR) and database (DB) (e.g., [2, 15, 17, 20, 19, 47, 26, 32, 24]). While the IR community has traditionally focused on searching unstructured content, and has developed various techniques for ranking query results and evaluating their effectiveness, the DB community has focused on developing query languages and efficient evaluation algorithms for highly structured content. Recent trends in DB and IR research demonstrate a growing interest in merging IR and DB techniques for accessing XML content. Support for a combination of "structured" and full-text search for effectively querying XML documents was unanimous in a recent panel at SIGMOD 2005 [3], and is being widely studied in the IR community [20].

[1]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[2]  Tova Milo,et al.  Algebras for Querying Text Regions: Expressive Power and Optimization , 1998, J. Comput. Syst. Sci..

[3]  N. Fuhr An Extension of XQL for Information Retrieval , 2000 .

[4]  Torsten Schlieder Similarity Search in XML Data using Cost-Based Query Transformations , 2001, WebDB.

[5]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[6]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Nicholas Kushmerick,et al.  Expressive and Efficient Ranked Querying of XML data , 2001, WebDB.

[8]  Hans-Jörg Schek,et al.  ETH Zürich at INEX: Flexible Information Retrieval from XML with PowerDB-XML , 2002, INEX Workshop.

[9]  Michael Gertz,et al.  XQuery/IR: Integrating XML Document and Data Retrieval , 2002, WebDB.

[10]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[12]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[13]  Divesh Srivastava,et al.  A System for Keyword Proximity Search on XML Databases , 2003, VLDB.

[14]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[15]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[16]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[17]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[18]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[19]  Jeffrey F. Naughton,et al.  On the Integration of Structure Indexes and Inverted Lists , 2004, ICDE.

[20]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[21]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[22]  Andrew Trotman,et al.  NEXI, Now and Next , 2004, INEX.

[23]  James P. Callan,et al.  Hierarchical Language Models for XML Component Retrieval , 2004, INEX.

[24]  Gabriella Kazai,et al.  The overlap problem in content-oriented XML retrieval evaluation , 2004, SIGIR '04.

[25]  Mounia Lalmas,et al.  Providing consistent and exhaustive relevance assessments for XML retrieval evaluation , 2004, CIKM '04.

[26]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[27]  Benjamin Piwowarski,et al.  An Algebra for Structured Queries in Bayesian Networks , 2004, INEX.

[28]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[29]  Wesley W. Chu,et al.  Configurable indexing and ranking for XML information retrieval , 2004, SIGIR '04.

[30]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[31]  M. de Rijke,et al.  The Importance of Length Normalization for XML Retrieval , 2005, Information Retrieval.

[32]  Djoerd Hiemstra,et al.  TIJAH Scratches INEX 2005: Vague Element Selection, Image Search, Overlap, and Relevance Feedback , 2005, INEX.

[33]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[34]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[35]  Gabriella Kazai,et al.  INEX 2005 Evaluation Measures , 2005, INEX.

[36]  Charles L. A. Clarke,et al.  Controlling overlap in content-oriented XML retrieval , 2005, SIGIR '05.

[37]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[38]  Shlomo Geva,et al.  GPX - Gardens Point XML IR at INEX 2006 , 2006, INEX.

[39]  Patrick Gallinari,et al.  Machine Learning Ranking and INEX'05 , 2005, INEX.

[40]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[41]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[42]  Sihem Amer-Yahia,et al.  XML Full-Text Search: Challenges and Opportunities , 2005, VLDB.

[43]  Mohand Boughanem,et al.  XFIRM at INEX 2005: Ad-Hoc and Relevance Feedback Tracks , 2005, INEX.

[44]  Jaap Kamps,et al.  The Effect of Structured Queries and Selective Indexing on XML Retrieval , 2005, INEX.

[45]  Jaana Kekäläinen,et al.  Query Evaluation with Structural Indices , 2005, INEX.

[46]  Yosi Mass,et al.  Using the INEX Environment as a Test Bed for Various User Models for XML Retrieval , 2005, INEX.

[47]  James P. Callan,et al.  Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval , 2005, INEX.

[48]  Andrew Trotman,et al.  The Interpretation of CAS , 2005, INEX.

[49]  Gabriella Kazai,et al.  TopX & XXL at INEX 2005 , 2005 .

[50]  Gerhard Weikum,et al.  TopX and XXL at INEX 2005 , 2005, INEX.

[51]  Sihem Amer-Yahia,et al.  Flexible and efficient XML search with complex full-text predicates , 2006, SIGMOD Conference.

[52]  Sihem Amer-Yahia,et al.  Expressiveness and Performance of Full-Text Search Languages , 2006, EDBT.

[53]  Djoerd Hiemstra,et al.  TIJAH Scratches INEX 2005. Vague Element Selection, Overlap, Image Search, Relevance Feedback, and Users (Notebook paper) , 2006 .

[54]  Airi Salminen PAT expressions: an algebra for text search , 2007 .

[55]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .