Configurable pattern-based evolutionary biclustering of gene expression data

BackgroundBiclustering algorithms for microarray data aim at discovering functionally related gene sets under different subsets of experimental conditions. Due to the problem complexity and the characteristics of microarray datasets, heuristic searches are usually used instead of exhaustive algorithms. Also, the comparison among different techniques is still a challenge. The obtained results vary in relevant features such as the number of genes or conditions, which makes it difficult to carry out a fair comparison. Moreover, existing approaches do not allow the user to specify any preferences on these properties.ResultsHere, we present the first biclustering algorithm in which it is possible to particularize several biclusters features in terms of different objectives. This can be done by tuning the specified features in the algorithm or also by incorporating new objectives into the search. Furthermore, our approach bases the bicluster evaluation in the use of expression patterns, being able to recognize both shifting and scaling patterns either simultaneously or not. Evolutionary computation has been chosen as the search strategy, naming thus our proposal Evo-Bexpa (Evo lutionary B iclustering based in Ex pression Pa tterns).ConclusionsWe have conducted experiments on both synthetic and real datasets demonstrating Evo-Bexpa abilities to obtain meaningful biclusters. Synthetic experiments have been designed in order to compare Evo-Bexpa performance with other approaches when looking for perfect patterns. Experiments with four different real datasets also confirm the proper performing of our algorithm, whose results have been biologically validated through Gene Ontology.

[1]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[2]  Ujjwal Maulik,et al.  A Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data , 2009, J. Bioinform. Comput. Biol..

[3]  Musa H. Asyali,et al.  Gene Expression Profile Classification: A Review , 2006 .

[4]  Federico Divina,et al.  Improved biclustering on expression data through overlapping control , 2009, Int. J. Intell. Comput. Cybern..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[7]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[10]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[11]  Shyama Das,et al.  Greedy Search-Binary PSO Hybrid for Biclustering Gene Expression Data , 2010 .

[12]  Ümit V. Çatalyürek,et al.  Comparative analysis of biclustering algorithms , 2010, BCB '10.

[13]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[14]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[15]  Zhoujun Li,et al.  Biclustering of microarray data with MOSPO based on crowding distance , 2009, BMC Bioinformatics.

[16]  Xuelong Li,et al.  Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[18]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[19]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[20]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[21]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[22]  Michael K. Ng,et al.  HARP: a practical projected clustering algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[26]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[27]  Achuthsankar S. Nair,et al.  Biclustering of gene expression data using reactive greedy randomized adaptive search procedure , 2009, BMC Bioinformatics.

[28]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[29]  Martin Vingron,et al.  DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach , 2011, Algorithms for Molecular Biology.

[30]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Ross D King,et al.  Are the current ontologies in biology good ontologies? , 2005, Nature Biotechnology.

[32]  Padraig Cunningham,et al.  Application of Simulated Annealing to the Biclustering of Gene Expression Data , 2006, IEEE Transactions on Information Technology in Biomedicine.

[33]  J. Watson,et al.  DNA: The Secret of Life , 2003 .

[34]  Jesús S. Aguilar-Ruiz,et al.  Measuring the Quality of Shifting and Scaling Patterns in Biclusters , 2010, PRIB.

[35]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[36]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[37]  Pierre Baldi,et al.  DNA Microarrays and Gene Expression - From Experiments to Data Analysis and Modeling , 2002 .

[38]  B. Commoner Is DNA the “secret of life”? , 1965, Clinical pharmacology and therapeutics.

[39]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[40]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[41]  Achim Tresch,et al.  Classification across gene expression microarray studies , 2009, BMC Bioinformatics.

[42]  Dario Floreano,et al.  Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies , 2008 .

[43]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[44]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[45]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[46]  Federico Divina,et al.  An effective measure for assessing the quality of biclusters , 2012, Comput. Biol. Medicine.

[47]  Gil Alterovitz,et al.  GO PaD: the Gene Ontology Partition Database , 2006, Nucleic Acids Res..

[48]  Anthony K. H. Tung,et al.  Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[49]  Fabrício Olivetti de França,et al.  Multi-Objective Biclustering: When Non-dominated Solutions are not Enough , 2009, J. Math. Model. Algorithms.

[50]  Fabian J. Theis,et al.  Knowledge-based gene expression classification via matrix factorization , 2008, Bioinform..

[51]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[52]  S. B. Nair,et al.  Bio-inspired artificial intelligence , 2012, 2012 3rd National Conference on Emerging Trends and Applications in Computer Science.

[53]  Ayse T. Daloglu,et al.  An improved genetic algorithm with initial population strategy and self-adaptive member grouping , 2008 .

[54]  Antonio Ruiz Cortés,et al.  STATService: Herramienta de análisis estadístico como soporte para la investigación con Metaheurísticas , 2012 .

[55]  Philip S. Yu,et al.  An Improved Biclustering Method for Analyzing Gene Expression Profiles , 2005, Int. J. Artif. Intell. Tools.

[56]  David A. Fenstermacher,et al.  Introduction to bioinformatics , 2005, J. Assoc. Inf. Sci. Technol..

[57]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[58]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[59]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[60]  Martin Sill,et al.  Robust biclustering by sparse singular value decomposition incorporating stability selection , 2011, Bioinform..

[61]  C. A. Coello Coello,et al.  Evolutionary multi-objective optimization: a historical view of the field , 2006, IEEE Computational Intelligence Magazine.

[62]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[63]  J. Zeitlinger,et al.  Polycomb complexes repress developmental regulators in murine embryonic stem cells , 2006, Nature.

[64]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[65]  Ujjwal Maulik,et al.  Finding Multiple Coherent Biclusters in Microarray Data Using Variable String Length Multiobjective Genetic Algorithm , 2009, IEEE Transactions on Information Technology in Biomedicine.

[66]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.