An Effective Algorithm for Extracting Maximal Bipartite Cliques

The reduction of bipartite clique enumeration problem into a clique enumeration problem is a well-known approach for extracting maximal bipartite cliques. In this approach, the graph inflation is used to transform a bipartite graph to a general graph, then any maximal clique enumeration algorithm can be used. However, between every two vertices (in the same set), the traditional inflation algorithm adds a new edge. Therefore incurring high computation overhead, which is impractical and cannot be scaled up to handle large graphs. This paper proposes a new algorithm for extracting maximal bipartite cliques based on an efficient graph inflation algorithm. The proposed algorithm adds the minimal number of edges that are required to convert all maximal bipartite cliques to maximal cliques. The proposed algorithm has been evaluated, using different real world benchmark graphs, according to the correctness of the algorithm, running time (in the inflation and enumeration steps), and according to the overhead of the inflation algorithm on the size of the generated general graph. The empirical evaluation proves that the proposed algorithm is accurate, efficient, effective, and applicable to real world graphs more than the traditional algorithm.

[1]  Mohammed J. Zaki,et al.  Theoretical Foundations of Association Rules , 2007 .

[2]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[3]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[4]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[5]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[6]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[7]  Oliver Eulenstein,et al.  Obtaining maximal concatenated phylogenetic data sets from large sequence databases. , 2003, Molecular biology and evolution.

[8]  Jinyan Li,et al.  Efficient Mining of Large Maximal Bicliques , 2006, DaWaK.

[9]  Aaron Kershenbaum,et al.  A graph-theoretical approach for pattern discovery in epidemiological research , 2007, IBM Syst. J..

[10]  Gösta Grahne,et al.  Reducing the Main Memory Consumptions of FPmax* and FPclose , 2004, FIMI.

[11]  Jinyan Li,et al.  A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns , 2005, PKDD.

[12]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[13]  Peter L. Hammer,et al.  Consensus algorithms for the generation of all maximal bicliques , 2004, Discret. Appl. Math..

[14]  Michael A. Langston,et al.  Combinatorial Genetic Regulatory Network Analysis Tools for High Throughput Transcriptomic Data , 2005, Systems Biology and Regulatory Genomics.

[15]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[16]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[17]  Jeremy J. Jay,et al.  Ontological Discovery Environment: a system for integrating gene-phenotype associations. , 2009, Genomics.

[18]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[19]  Yun Zhang,et al.  On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types , 2013, BMC Bioinformatics.

[20]  Jinyan Li,et al.  Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms , 2007, IEEE Transactions on Knowledge and Data Engineering.

[21]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[22]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .