Diverging patterns: discovering significant frequency change dissimilarities in large databases

In this paper, we present a framework for mining diverging patterns, a new type of contrast patterns whose frequency changes significantly differently in two data sets, e.g., it changes from a relatively low to a relatively high value in one dataset, but from high to low in the other. In this framework, a measure called diverging ratio is defined and used to discover diverging patterns. We use a four-dimensional vector to represent a pattern, and define the pattern's diverging ratio based on the angular difference between its vectors in two datasets. An algorithm is proposed to mine diverging patterns from a pair of datasets, which makes use of a standard frequent pattern mining algorithm to compute vector components efficiently. We demonstrate the effectiveness of our approach on real-world datasets, showing that the method can reveal novel knowledge from large databases.

[1]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[2]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[3]  Kotagiri Ramamohanarao,et al.  Making Use of the Most Expressive Jumping Emerging Patterns for Classification , 2001, Knowledge and Information Systems.

[4]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[5]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[6]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[7]  Wynne Hsu,et al.  Discovering the set of fundamental rule changes , 2001, KDD '01.

[8]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[9]  Geoffrey I. Webb,et al.  On detecting differences between groups , 2003, KDD '03.

[10]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Kotagiri Ramamohanarao,et al.  Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  James Bailey,et al.  Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[15]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[16]  James Bailey,et al.  Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams , 2006, KDD '06.