Rare Itemset Mining

A pattern is a collection of events/features that occur together in a transaction database. Previous studies in the field are often dedicated to the problem of frequent pattern mining where only patterns that appear frequently in the input data are mined. As a result, patterns involving events/features that appear in few data sets are not captured. In some domains, such as the detection of computer attacks, fraudulent transactions in financial institutions, those patterns, also known as rare patterns, are more interesting than frequent patterns. We propose a framework to represent different categories of interesting patterns and then instantiate it to the specific case of rare patterns. Later on, we present a generic framework to mine patterns based on the Apriori approach. In this paper we are interested by the patterns composed of a set of items, also called itemsets. Thus, we instantiate the generalized Apriori framework to mine rare itemsets. The resulting approach is Apriori-like and the mine idea behind it is that if the itemset lattice representing the itemset space in classical Apriori approaches is traversed on a bottom-up manner, equivalent properties to the Apriori exploration of frequent itemsets are provided to mine rare itemsets. This include an anti-monotone property and a level- wise exploration of the itemset space. As demonstrated by our experiments, our approach is effective in identifying all rare itemsets and is more efficient than the existing approach.

[1]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[2]  Amedeo Napoli,et al.  Vers l'extraction de motifs rares , 2006, EGC.

[3]  Malik Magdon-Ismail,et al.  Efficient Identification of Overlapping Communities , 2005, ISI.

[4]  Phil Bagwell,et al.  Ideal Hash Trees , 2001 .

[5]  André C. P. L. F. de Carvalho,et al.  A method for refining knowledge rules using exceptions , 2003 .

[6]  Xiangliang Zhang,et al.  Toward Behavioral Modeling of a Grid System: Mining the Logging and Bookkeeping Files , 2007 .

[7]  Sanguthevar Rajasekaran,et al.  A transaction mapping algorithm for frequent itemsets mining , 2006 .

[8]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[9]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[10]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[11]  Norihiro Sakamoto,et al.  A framework for dynamic evidence based medicine using data mining , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[12]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[13]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[14]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[15]  Malik Magdon-Ismail,et al.  Inferring agent dynamics from social communication network , 2007, WebKDD/SNA-KDD '07.

[16]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[17]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.