A New Approach for Association Rule Mining and Bi-clustering Using Formal Concept Analysis

Association rule mining and bi-clustering are data mining tasks that have become very popular in many application domains, particularly in bioinformatics. However, to our knowledge, no algorithm was introduced for performing these two tasks in one process. We propose a new approach called FIST for extracting bases of extended association rules and conceptual bi-clusters conjointly. This approach is based on the frequent closed itemsets framework and requires a unique scan of the database. It uses a new suffix tree based data structure to reduce memory usage and improve the extraction efficiency, allowing parallel processing of the tree branches. Experiments conducted to assess its applicability to very large datasets show that FIST memory requirements and execution times are in most cases equivalent to frequent closed itemsets based algorithms and lower than frequent itemsets based algorithms.

[1]  Donna R. Maglott,et al.  Human immunodeficiency virus type 1, human protein interaction database at NCBI , 2008, Nucleic Acids Res..

[2]  Nicolas Pasquier,et al.  Closed Set Based Discovery of Small Covers for Association Rules , 1999, Proc. 15èmes Journées Bases de Données Avancées, BDA.

[3]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[4]  Engelbert Mephu Nguifo,et al.  Frequent closed itemset based algorithms: a thorough structural and analytical survey , 2006, SKDD.

[5]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[6]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[7]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[8]  David L Robertson,et al.  Cataloguing the HIV type 1 human protein interaction network. , 2008, AIDS research and human retroviruses.

[9]  Gerd Stumme,et al.  Generating a Condensed Representation for Association Rules , 2005, Journal of Intelligent Information Systems.

[10]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[11]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[14]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[15]  Nicolas Pasquier,et al.  Closed sets based discovery of small covers for association rules (extended version) , 2001 .

[16]  Maryam Shekofteh,et al.  A Survey of Algorithms in FCIM , 2010, 2010 International Conference on Data Storage and Data Engineering.

[17]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[18]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[19]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Engelbert Mephu Nguifo,et al.  Succinct System of Minimal Generators: A Thorough Study, Limitations and New Definitions , 2006, CLA.

[22]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.