A Framework for Efficient Association Rule Mining in XML Data

In this article, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document first are preprocessed to transform either to an Indexed XML Tree (IX-tree) or to Multirelational Databases (Multi-DB), depending on the size of the XML document and the memory constraint of the system, for efficient data selection and AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized metapatterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent undergeneralization or overgeneralization. Resulting generalized metapatterns are used to generate large ARs that meet the support and confidence levels. A greedy algorithm is also presented in order to integrate data selection and large itemset generation to enhance the efficiency of the AR mining process. The experiments conducted show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.

[1]  Giuseppe Psaila,et al.  Hierarchy-based mining of association rules in data warehouses , 2000, SAC '00.

[2]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, PKDD.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Bin Chen,et al.  An Algorithm for Constrained Association Rule Mining in Semi-structured Data , 1999, PAKDD.

[5]  Douglas B. Bock,et al.  Accuracy in Modeling with Extended Entity Relationship and Object Oriented Data Models , 1993 .

[6]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, Inf. Syst..

[7]  Alessandro Campi,et al.  Mining Association Rules from XML Data , 2002, DaWaK.

[8]  Bin Chen,et al.  Generating association rules from semi-structured documents using an extended concept hierarchy , 1997, CIKM '97.

[9]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[10]  Anthony Scime,et al.  Web Mining to Create a Domain Specific Web Portal Database , 2003, Web-Powered Databases.

[11]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[12]  Gillian Dobbie,et al.  Extracting association rules from XML documents using XQuery , 2003, WIDM '03.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[15]  Giuseppe Psaila,et al.  A tightly-coupled architecture for data mining , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  Dov Dori,et al.  Object-Process Methodology Applied to Modeling Credit Card Transactions , 2001, J. Database Manag..

[17]  Philip Calvert,et al.  Encyclopedia of Database Technologies and Applications , 2005 .

[18]  Hussein H. Aly,et al.  Mining association rules , 2001, CATA.