BEMI Bicluster Ensemble Using Mutual Information

Biclustering solutions generally depend upon various parameters like number of biclusters and random initialisations. Ensemble techniques have been used to eliminate the impact of such parameters on the output. In this paper, we present a novel ensemble technique for biclustering solutions using mutual information. Unlike the existing approaches, the proposed technique does not require the biclusters to be aligned. As a result, it does away with the requirement that all the biclustering solutions generate the same number of biclusters. Moreover, most of the existing approaches require the user to specify the number of output biclusters. Our approach determines the number of well separated biclusters from the input solutions itself. Experiments performed on synthetic and real datasets show that our approach improves upon the biclustering error over the input solutions as well as the ensemble techniques of hanczar et al.

[1]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[2]  Joydeep Ghosh,et al.  Matching and Visualization of Multiple Overlapping Clusterings of Microarray Data , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[3]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  Andrea Tagarelli,et al.  Projective clustering ensembles , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[8]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[9]  Kathryn B. Laskey,et al.  Nonparametric Bayesian Co-clustering Ensembles , 2011, SDM.

[10]  Neelima Gupta,et al.  MIB: Using mutual information for biclustering gene expression data , 2010, Pattern Recognit..

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Anil K. Jain,et al.  The bootstrap approach to clustering , 1987 .

[13]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[14]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[15]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Xiaohua Hu,et al.  Cluster Ensemble and Its Applications in Gene Expression Analysis , 2004, APBC.

[17]  Blaise Hanczar,et al.  Using the bagging approach for biclustering of gene expression data , 2011, Neurocomputing.

[18]  Neelima Gupta,et al.  BiETopti-BiClustering Ensemble Using Optimization Techniques , 2013, ICDM.

[19]  Vikas Singh,et al.  Ensemble clustering using semidefinite programming with applications , 2010, Machine Learning.

[20]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, ICPR 2004.

[21]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[22]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[23]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[24]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.