Answering XML queries by means of data summaries

XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets. Instance patterns may be used for (possibly partially) answering queries, either when fast and approximate answers are required, or when the actual dataset is not available, for example, it is currently unreachable. Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result. Furthermore, they provide fast query answers and show good scalability on the size of the dataset, thus overcoming the document size limitation of most current XQuery engines.

[1]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[3]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[4]  Hao He,et al.  Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[6]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Viswanath Poosala,et al.  Aqua: A Fast Decision Support Systems Using Approximate Query Answers , 1999, VLDB.

[9]  Peter Boncz,et al.  Pathfinder: relational XQuery over multi-gigabyte XML inputs in interactive time , 2005 .

[10]  Elena Baralis,et al.  Summarizing XML Data by Means of Association Rules , 2004, EDBT Workshops.

[11]  Qin Ding,et al.  Mining Association Rules from XML Data , 2008 .

[12]  Wolfgang Meier,et al.  eXist: An Open Source Native XML Database , 2002, Web, Web-Services, and Database Systems.

[13]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[14]  Letizia Tanca,et al.  G-Log: A Graph-Based Query Language , 1995, IEEE Trans. Knowl. Data Eng..

[15]  TancaLetizia,et al.  Answering XML queries by means of data summaries , 2007 .

[16]  DANIELE BRAGA,et al.  XQBE (XQuery By Example): A visual interface to the standard XML query language , 2005, TODS.

[17]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[18]  J Allan,et al.  Readings in information retrieval. , 1998 .

[19]  Jignesh M. Patel,et al.  XIST: An XML Index Selection Tool , 2004, XSym.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Maxim N. Grinev,et al.  Sedna: A Native XML DBMS , 2006, SOFSEM.

[22]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[23]  John Staples,et al.  Unification of quantified terms , 1986, Graph Reduction.

[24]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[25]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[26]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[27]  Amihai Motro Using Integrity Constraints to Provide Intensional Answers to Relational Queries , 1989, VLDB.

[28]  L. Feng,et al.  Mining XML-Enabled Association Rules with Templates , 2004 .

[29]  David Park,et al.  Concurrency and Automata on Infinite Sequences , 1981, Theoretical Computer Science.

[30]  Tharam S. Dillon,et al.  Mining Interesting XML-Enabled Association Rules with Templates , 2004, KDID.

[31]  Letizia Tanca,et al.  Modeling Semistructured Data by Using Graph-Based Constraints , 2003, OTM Workshops.

[32]  Letizia Tanca,et al.  G-Log: A Declarative Graphical Query Language , 1991, DOOD.

[33]  Steven J. DeRose,et al.  Extensible Markup Language (XML) Part 2: Linking , 1997, World Wide Web J..

[34]  Alessandro Campi,et al.  Design and implementation of a graphical interface to XQuery , 2003, SAC '03.

[35]  George Anatomy of a Native XML Database , 2005 .

[36]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[37]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[38]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[39]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.