Incremental Mining of Frequent Query Patterns from XML Queries for Caching

Existing studies for mining frequent XML query patterns mainly introduce a straightforward candidate generate-and-test strategy and compute frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XML query patterns transformed from user queries. However, it is nontrivial to maintain such discovered frequent patterns in real XML databases because there may incur frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent ones. Accordingly, existing proposals are inefficient for the evolution of the transaction database. To address these problems, this paper presents an efficient algorithm IPS-FXQPMiner for mining frequent XML query patterns without candidate maintenance and costly tree-containment checking. We transform XML queries into sequences through a one- to-one mapping and then mine the frequent sequences to generate frequent XML query patterns. More importantly, based on IPS-FXQPMiner, an efficient incremental algorithm, Incre-FXQPMiner is proposed to incrementally mine frequent XML query patterns, which can minimize the I/O and computation requirements for handling incremental updates. Our experimental study on various real-life datasets demonstrates the efficiency and scalability of our algorithms over previous known alternatives.

[1]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[2]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[3]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Dan Suciu,et al.  Query Caching and View Selection for XML Databases , 2005, VLDB.

[5]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[6]  Mong-Li Lee,et al.  Efficient Mining of XML Query Patterns for Caching , 2003, VLDB.

[7]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[8]  Hamid Pirahesh,et al.  A Framework for Using Materialized XPath Views in XML Query Processing , 2004, VLDB.

[9]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[10]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[11]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Yi Chen,et al.  Incremental mining of frequent XML query patterns , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[15]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Sushil Jajodia,et al.  Mining Temporal Relationships with Multiple Granularities in Time Sequences , 1998, IEEE Data Eng. Bull..

[17]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[19]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[20]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[21]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[22]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[23]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[24]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[25]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[26]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[27]  Mong-Li Lee,et al.  Mining frequent query patterns from XML queries , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[28]  Joonho Kwon,et al.  FiST: Scalable XML Document Filtering by Sequencing Twig Patterns , 2005, VLDB.