On Minimal Infrequent Itemset Mining

A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets include statistical disclosure risk assessment, bioinformatics, and fraud detection. This is the first algorithm designed specifically for finding these rare itemsets. Many itemset properties used implicitly in the algorithm are proved. The problem is shown to be NP-complete. Experimental results are then presented.

[1]  Guizhen Yang,et al.  Computational aspects of mining maximal frequent patterns , 2006, Theor. Comput. Sci..

[2]  John A. Keane,et al.  A recursive search algorithm for statistical disclosure assessment , 2007, Data Mining and Knowledge Discovery.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Anna M. Manning,et al.  A new algorithm for finding minimal sample uniques for use in statistical disclosure assessment , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  H. Mannila,et al.  Discovering all most specific sentences , 2003, TODS.

[6]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[8]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.

[11]  Akimichi Takemura,et al.  MINIMUM UNSAFE AND MAXIMUM SAFE SETS OF VARIABLES FOR DISCLOSURE RISK ASSESSMENT OF INDIVIDUAL RECORDS IN A MICRODATA SET , 2002 .

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.