Mining Interesting Patterns Using Estimated Frequencies from Subpatterns and Superpatterns

In knowledge discovery in databases, the number of discovered patterns is often too enormous for human to understand, so that filtering out less important ones is needed. For this purpose, a number of interestingness measures of patterns have been introduced, and conventional ones evaluate a pattern as how its actual frequency is higher than the predicted values from its subpatterns. These measures may assign high scores to not only a pattern consisting of a set of strongly correlated items but also its subpatterns, and in many cases it is unnecessary to select all these subpatterns as interesting. To reduce this redundancy, we propose a new approach to evaluation of interestingness of patterns. We use a measure of interestingness which evaluates how the actual frequency of a pattern is higher than the predicted not only from its subpatterns but also from its superpatterns. On the strength of adding an estimation from superpatterns, our measure can more powerfully filter out redundant subpatterns than conventional measures. We discuss the effectiveness of our interestingness measure through a set of experimental results.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[3]  Xindong Wu,et al.  Research and Development in Knowledge Discovery and Data Mining , 1998, Lecture Notes in Computer Science.

[4]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Szymon Jaroszewicz,et al.  Pruning Redundant Association Rules Using Maximum Entropy Principle , 2002, PAKDD.

[7]  Szymon Jaroszewicz,et al.  A General Measure of Rule Interestingness , 2001, PKDD.

[8]  George Karypis,et al.  A Universal Formulation of Sequential Patterns , 1999 .

[9]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[10]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[11]  Gerald W. Kimble,et al.  Information and Computer Science , 1975 .

[12]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[13]  Georges Gardarin,et al.  Advances in Database Technology — EDBT '96 , 1996, Lecture Notes in Computer Science.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Jinyan Li,et al.  Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness , 1998, PAKDD.