New heuristics for the Bicluster Editing Problem

The NP-hard Bicluster Editing Problem (BEP) consists of editing a minimum number of edges of an input bipartite graph G in order to transform it into a vertex-disjoint union of complete bipartite subgraphs. Editing an edge consists of either adding it to the graph or deleting it from the graph. Applications of the BEP include data mining and analysis of gene expression data. In this work, we generate and analyze random bipartite instances for the BEP to perform empirical tests. A new reduction rule for the problem is proposed, based on the concept of critical independent sets, providing an effective reduction in the size of the instances. We also propose a set of heuristics using concepts of the metaheuristics ILS, VNS, and GRASP, including a constructive heuristic based on analyzing vertex neighborhoods, three local search procedures, and an auxiliary data structure to speed up the local search. Computational experiments show that our heuristics outperform other methods from the literature with respect to both solution quality and computational time.

[1]  Peng Sun,et al.  BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data , 2013, BMC Proceedings.

[2]  Beverly Sackler,et al.  The Bicluster Graph Editing Problem , 2004 .

[3]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[4]  Pierre Hansen,et al.  Variable neighbourhood search: methods and applications , 2010, Ann. Oper. Res..

[5]  Cynthia Dwork International Conference on Theory and Applications of Models of Computation , 2008 .

[6]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[7]  V. Jawahar Senthil Kumar,et al.  Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval , 2010, 2010 First International Conference on Integrated Intelligent Computing.

[8]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[9]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[10]  Peng Sun,et al.  Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering , 2014, Nucleic acids research.

[11]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[12]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[13]  Roded Sharan,et al.  Cluster graph modification problems , 2002, Discret. Appl. Math..

[14]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[15]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[16]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[17]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[18]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[19]  Jayme Luiz Szwarcfiter,et al.  Applying Modular Decomposition to Parameterized Cluster Editing Problems , 2008, Theory of Computing Systems.

[20]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[21]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010 .

[23]  Ümit V. Çatalyürek,et al.  A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets , 2009, BICoB.

[24]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[25]  Béla Bollobás,et al.  Random Graphs , 1985 .

[26]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[27]  Roded Sharan,et al.  Cluster Graph Modification Problems , 2002, WG.

[28]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[29]  Yong Zhang,et al.  Improved Algorithms for Bicluster Editing , 2008, TAMC.

[30]  Helena Ramalhinho Dias Lourenço,et al.  Iterated Local Search , 2001, Handbook of Metaheuristics.

[31]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Nir Ailon,et al.  Improved Approximation Algorithms for Bipartite Correlation Clustering , 2012, SIAM J. Comput..

[33]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..