Learning quantifiable associations via principal sparse non-negative matrix factorization

Association rules are traditionally designed to capture statistical relationship among itemsets in a given database. To additionally capture the quantitative association knowledge, Korn et.al. recently propose a paradigm named Ratio Rules [6] for quantifiable data mining. However, their approach is mainly based on Principle Component Analysis (PCA), and as a result, it cannot guarantee that the ratio coefficients are non-negative. This may lead to serious problems in the rules' application. In this paper, we propose a new method, called Principal Sparse Non-negative Matrix Factorization (PSNMF), for learning the associations between itemsets in the form of Ratio Rules. In addition, we provide a support measurement to weigh the importance of each rule for the entire dataset. Experiments on several datasets illustrate that the proposed method performs well for discovering latent associations between itemsets in large datasets.

[1]  David A. Bell,et al.  The rough set approach to association rule mining , 2003, Third IEEE International Conference on Data Mining.

[2]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[3]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Ran Wolff,et al.  A high-performance distributed algorithm for mining association rules , 2003, Third IEEE International Conference on Data Mining.

[6]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[7]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[8]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[10]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..

[11]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[12]  Reda Alhajj,et al.  Facilitating fuzzy association rules mining by using multi-objective genetic algorithms for automated clustering , 2003, Third IEEE International Conference on Data Mining.

[13]  Ke Wang,et al.  Mining association rules from stars , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[15]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[16]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Philip S. Yu,et al.  Mining associations by pattern structure in large relational tables , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  Srinivasan Parthasarathy,et al.  Mining frequent itemsets in distributed and dynamic databases , 2003, Third IEEE International Conference on Data Mining.

[21]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[22]  Christos Faloutsos,et al.  Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining , 1998, VLDB.

[23]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.