BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data

BackgroundThe explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two different types of data simultaneously, might be used for several biomedical scenarios. However, the underlying algorithmic problem is NP-hard.ResultsHere we contribute with BiCluE, a software package designed to solve the weighted bi-cluster editing problem. It implements (1) an exact algorithm based on fixed-parameter tractability and (2) a polynomial-time greedy heuristics based on solving the hardest part, edge deletions, first. We evaluated its performance on artificial graphs. Afterwards we exemplarily applied our implementation on real world biomedical data, GWAS data in this case. BiCluE generally works on any kind of data types that can be modeled as (weighted or unweighted) bipartite graphs.ConclusionsTo our knowledge, this is the first software package solving the weighted bi-cluster editing problem. BiCluE as well as the supplementary results are available online at http://biclue.mpi-inf.mpg.de.

[1]  S. Böcker,et al.  Comprehensive cluster analysis with Transitivity Clustering , 2011, Nature Protocols.

[2]  Thomas Meitinger,et al.  Common Variants in KCNN3 are Associated with Lone Atrial Fibrillation , 2010, Nature Genetics.

[3]  Sarah E. Medland,et al.  A Quantitative-Trait Genome-Wide Association Study of Alcoholism Risk in the Community: Findings and Implications , 2011, Biological Psychiatry.

[4]  Beverly Sackler,et al.  The Bicluster Graph Editing Problem , 2004 .

[5]  Mark I. McCarthy,et al.  Genome-Wide Association Study Reveals Multiple Loci Associated with Primary Tooth Development during Infancy , 2010, PLoS genetics.

[6]  Srinivas Aluru Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series) , 2005 .

[7]  Jiong Guo,et al.  A More Effective Linear Kernelization for Cluster Editing , 2007, ESCAPE.

[8]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[9]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[10]  Peng Sun,et al.  Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing , 2012, J. Integr. Bioinform..

[11]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .

[12]  Srinivas Aluru,et al.  Handbook Of Computational Molecular Biology , 2010 .

[13]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[14]  Jayme Luiz Szwarcfiter,et al.  Applying Modular Decomposition to Parameterized Bicluster Editing , 2006, IWPEC.

[15]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[16]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[17]  Yong Zhang,et al.  Improved Algorithms for Bicluster Editing , 2008, TAMC.

[18]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.