Bi-Phase Evolutionary Searching for Biclusters in Gene Expression Data

The analysis of gene expression data is useful for detecting the biological information of genes. Biclustering of microarray data has been proposed as a powerful computational tool to discover subsets of genes that exhibit consistent expression patterns along subsets of conditions. In this paper, we propose a novel biclustering algorithm called the bi-phase evolutionary biclustering algorithm. The first phase is for the evolution of rows and columns, and the other is for the evolution of biclusters. The interaction of the two phases ensures a reliable search direction and accelerates the convergence to good solutions. Furthermore, the population is initialized using a conventional hierarchical clustering strategy to discover bicluster seeds. We also developed a seed-based parallel implementation of evolutionary searching to search biclusters more comprehensively. The performance of the proposed algorithm is compared with several popular biclustering algorithms using synthetic datasets and real microarray datasets. The experimental results show that the algorithm demonstrates a significant improvement in discovering biclusters.

[1]  Xuelong Li,et al.  Biclustering Learning of Trading Rules , 2015, IEEE Transactions on Cybernetics.

[2]  Günter Rudolph,et al.  Convergence analysis of canonical genetic algorithms , 1994, IEEE Trans. Neural Networks.

[3]  Xuelong Li,et al.  Exploiting Local Coherent Patterns for Unsupervised Feature Ranking , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[5]  Rui Henriques,et al.  BSig: evaluating the statistical significance of biclustering solutions , 2017, Data Mining and Knowledge Discovery.

[6]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[7]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[8]  Maoguo Gong,et al.  A Multiobjective Cooperative Coevolutionary Algorithm for Hyperspectral Sparse Unmixing , 2017, IEEE Transactions on Evolutionary Computation.

[9]  Jesús S. Aguilar-Ruiz,et al.  Measuring the Quality of Shifting and Scaling Patterns in Biclusters , 2010, PRIB.

[10]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[12]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[13]  Rui Henriques,et al.  Biclustering with Flexible Plaid Models to Unravel Interactions between Biological Processes , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Federico Divina,et al.  Virtual Error: A New Measure for Evolutionary Biclustering , 2007, EvoBIO.

[15]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[16]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[17]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[18]  Ayse T. Daloglu,et al.  An improved genetic algorithm with initial population strategy and self-adaptive member grouping , 2008 .

[19]  Rui Henriques,et al.  BicPAM: Pattern-based biclustering for biomedical data analysis , 2014, Algorithms for Molecular Biology.

[20]  Ujjwal Maulik,et al.  A Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data , 2009, J. Bioinform. Comput. Biol..

[21]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Xuelong Li,et al.  Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[24]  Frederick P. Roth,et al.  Next generation software for functional trend analysis , 2009, Bioinform..

[25]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[26]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[27]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[28]  Xuelong Li,et al.  Automatic segmentation of breast lesions for interaction in ultrasonic computer-aided diagnosis , 2015, Inf. Sci..

[29]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[30]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[31]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[32]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[33]  Cláudia Antunes,et al.  A structured view on pattern mining-based biclustering , 2015, Pattern Recognit..

[34]  Jesús S. Aguilar-Ruiz,et al.  Configurable pattern-based evolutionary biclustering of gene expression data , 2012, Algorithms for Molecular Biology.

[35]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[36]  Byoung-Tak Zhang,et al.  A probabilistic coevolutionary biclustering algorithm for discovering coherent patterns in gene expression dataset , 2012, BMC Bioinformatics.

[37]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[38]  Federico Divina,et al.  An effective measure for assessing the quality of biclusters , 2012, Comput. Biol. Medicine.