Efficient Monitoring of Patterns in Data Mining Environments

In this article, we introduce a general framework for monitoring patterns and detecting interesting changes without continuously mining the data. Using our approach, the effort spent on data mining can be drastically reduced while the knowledge extracted from the data is kept up to date. Our methodology is based on a temporal representation for patterns, in which both the content and the statistics of a pattern are modeled. We divide the KDD process into two phases. In the first phase, data from the first period is mined and interesting rules and patterns are identified. In the second phase, using the data from subsequent periods, statistics of these rules are extracted in order to decide whether or not they still hold. We applied this technique in a case study on mining mail log data. Our results show that a minimal set of patterns reflecting the invariant properties of the dataset can be identified, and that interesting changes to the population can be recognized indirectly by monitoring a subset of the patterns found in the first phase.

[1]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[2]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[3]  Ke Wang,et al.  Discovering Patterns from Large and Dynamic Sequential Data , 1997, Journal of Intelligent Information Systems.

[4]  Szymon Jaroszewicz,et al.  Pruning Redundant Association Rules Using Maximum Entropy Principle , 2002, PAKDD.

[5]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[6]  Yiming Ma,et al.  Analyzing the interestingness of association rules from the temporal dimension , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Myra Spiliopoulou,et al.  Monitoring Change in Mining Results , 2001, DaWaK.

[8]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[9]  Necip Fazil Ayan,et al.  An efficient algorithm to update large itemsets with early pruning , 1999, KDD '99.

[10]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[11]  Edward Omiecinski,et al.  Efficient Mining of Association Rules in Large Dynamic Databases , 1998, BNCOD.

[12]  Johannes Gehrke,et al.  DEMON: mining and monitoring evolving data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Wynne Hsu,et al.  Discovering the set of fundamental rule changes , 2001, KDD '01.

[14]  Xiaodong Chen,et al.  Mining Temporal Features in Association Rules , 1999, PKDD.

[15]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.