BUXMiner: An Efficient Bottom-Up Approach to Mining XML Query Patterns

Discovery of frequent XML query patterns in the history log of XML queries can be used to expedite XML query processing, as the answers to these queries can be cached and reused when the future queries "hit" such frequent patterns. In this paper, we propose an efficient bottom-up mining approach to finding frequent query patterns in XML queries. We merge all queries into a summarizing structure named global tree guide (GTG). We refine GTG by pruning infrequent nodes and clustering adjacent nodes in the queries to obtain a Compressed GTG (known as CGTG). We employ a bottom-up traversal scheme based on CGTG to generate frequent query patterns for each node till the root of CGTG. Experiments show that our proposed method is efficient and outperforms the previous mining algorithms of XML queries, such as XQPMinerTID and FastXMiner.

[1]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[2]  Mong-Li Lee,et al.  Mining frequent query patterns from XML queries , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[3]  Elke A. Rundensteiner,et al.  XCache: a semantic caching system for XML queries , 2002, SIGMOD '02.

[4]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[5]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[6]  F. Luccio,et al.  Exact Rooted Subtree Matching in Sublinear Time , 2001 .

[7]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[8]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[9]  Liang-Tien Chia,et al.  Mining Positive and Negative Association Rules from XML Query Patterns for Caching , 2005, DASFAA.

[10]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[11]  Mong-Li Lee,et al.  Efficient Mining of XML Query Patterns for Caching , 2003, VLDB.

[12]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[13]  Vagelis Hristidis,et al.  Semantic Caching of XML Databases , 2002, WebDB.