Mining XML-Enabled Association Rules with Templates

XML-enabled association rule framework [8] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world, mining from XML data, however, is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. In order to make XML-enabled association rule mining truly practical and computationally tractable, in this study, we present a template model to help users specify the interesting XML-enabled associations to be mined. Techniques for template-guided mining of association rules from large XML data are also described in the paper. We demonstrate the effectiveness of these techniques through a set of experiments on both synthetic and real-life data.

[1]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[2]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[3]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[4]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[6]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[7]  Takayoshi Shoudai,et al.  Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents , 2001, PAKDD.

[8]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[9]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[10]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.

[11]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[12]  Bin Chen,et al.  Generating association rules from semi-structured documents using an extended concept hierarchy , 1997, CIKM '97.

[13]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[14]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[15]  Ke Wang,et al.  Discovering Frequent Substructures from Hierarchical Semi-structured Data , 2002, SDM.

[16]  Ioana Manolescu,et al.  Why and how to benchmark XML databases , 2001, SGMD.

[17]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[18]  Ke Wang,et al.  Schema Discovery for Semistructured Data , 1997, KDD.

[19]  Jiawei Han,et al.  Meta-Rule-Guided Mining of Association Rules in Relational Databases , 1995, KDOOD/TDOOD.

[20]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[21]  Elena Baralis,et al.  Designing Templates for Mining Association Rules , 2004, Journal of Intelligent Information Systems.

[22]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[23]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.