Keyword proximity search on XML graphs

XKeyword provides efficient keyword proximity queries on large XML graph databases. A query is simply a list of keywords and does not require any schema or query language knowledge for its formulation. XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the graph's schema. In particular, XKeyword consists of two stages. In the preprocessing stage a set of keyword indices are built along with indexed path relations that describe particular patterns of paths in the graph. In the query processing stage plans are developed that use a near optimal set of path relations to efficiently locate the keyword query results. The results are presented graphically using the novel idea of interactive result graphs, which are populated on-demand according to the user's navigation and allow efficient information discovery. We provide theoretical and experimental points for the selection of the appropriate set of precomputed path relations. We also propose and experimentally evaluate algorithms to minimize the number of queries sent to the database to output the top-K results.

[1]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[2]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[3]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[4]  J. Plesník A bound for the Steiner tree problem in graphs , 1981 .

[5]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[6]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[7]  J. D. Uiiman,et al.  Principles of Database Systems , 2004, PODS 2004.

[8]  Donald D. Chamberlin,et al.  XQuery: a query language for XML , 2003, SIGMOD '03.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[11]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[12]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[13]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[14]  David Orchard,et al.  XML Linking Language (XLink) , 2001 .

[15]  Yehoshua Sagiv,et al.  Flexible queries over semistructured data , 2001, PODS '01.

[16]  Ioana Manolescu,et al.  Agora: Living with XML and Relational , 2000, VLDB.

[17]  Yannis Papakonstantinou,et al.  Storing and querying XML data using denormalized relational databases , 2005, The VLDB Journal.

[18]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[19]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[20]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Menzo Windhouwer,et al.  Efficient Relational Storage and Retrieval of XML Documents , 2000, WebDB.

[22]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[23]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[24]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[25]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Jeffrey D. Ullman,et al.  Principles of Database Systems, 2nd Edition , 1982 .