A novel framework for detecting maximally banded matrices in binary data

Binary data occurs often in real‐world applications ranging from social networks to bioinformatics. As such, extracting patterns from binary data has been a fundamental task of data mining. Recently, the utility of banded structures in binary matrices has been pointed out for applications such as paleontology, bioinformatics, and social networking. A binary matrix has a banded structure if both the rows and columns can be permuted so that the 1s exhibit a staircase pattern down the rows, along the leading diagonal. Natural interpretations of banded structures include overlapping communities in social networks, patterns of species occurring in spatially correlated sites, and overlapping roles of genes in various diseases. In this paper, we show the correspondence between formal concept analysis and banded structure; as a direct result of this correspondence a novel framework for discovering banded structures is presented. Utilizing the framework, the MMBS algorithm (mine maximally banded submatrices) is developed. The current state‐of‐the‐art algorithm, MBS, only allows for the discovery of a single band and assumes a fixed‐column permutation. On the other hand, MMBS facilitates the discovery of multiple bands that may possibly be overlapping or segmented. Our experimental results, presented here, clearly indicate the advantage of MMBS over MBS with both, synthetic and real datasets. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 431‐445, 2010 © 2010 Wiley Periodicals, Inc.

[1]  Raj Bhatnagar,et al.  A levelwise search algorithm for interesting subspace clusters , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Malay K. Sen,et al.  Indifference Digraphs: A Generalization of Indifference Graphs and Semiorders , 1994, SIAM J. Discret. Math..

[3]  Heikki Mannila,et al.  Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods , 2006, PLoS Comput. Biol..

[4]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Gemma C. Garriga,et al.  Banded structure in binary matrices , 2008, KDD.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Ilya Shmulevich,et al.  Binary analysis and optimization-based normalization of gene expression data , 2002, Bioinform..

[8]  Richard Rosen Matrix bandwidth minimization , 1968, ACM National Conference.

[9]  Ümit V. Çatalyürek,et al.  Permuting Sparse Rectangular Matrices into Block-Diagonal Form , 2004, SIAM J. Sci. Comput..

[10]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Sergei O. Kuznetsov,et al.  Algorithms for the Construction of Concept Lattices and Their Diagram Graphs , 2001, PKDD.

[12]  Heikki Mannila,et al.  Nestedness and segmented nestedness , 2007, KDD '07.

[13]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[14]  Raj Bhatnagar,et al.  An Algorithm for Well Structured Subspace Clusters , 2005, SDM.

[15]  Raj Bhatnagar,et al.  Detecting significant distinguishing sets among bi-clusters , 2008, CIKM '08.