Scatter search-based identification of local patterns with positive and negative correlations in gene expression data

Graphical abstractDisplay Omitted HighlightsBiclustering of gene expression data.Scatter search metaheuristic.Correlation-based merit function.Positive and negative correlations among genes.Comparison is based on a priori biological information. This paper presents a scatter search approach based on linear correlations among genes to find biclusters, which include both shifting and scaling patterns and negatively correlated patterns contrarily to most of correlation-based algorithms published in the literature. The methodology established here for comparison is based on a priori biological information stored in the well-known repository Gene Ontology (GO). In particular, the three existing categories in GO, Biological Process, Cellular Components and Molecular Function, have been used. The performance of the proposed algorithm has been compared to other benchmark biclustering algorithms, specifically a group of classical biclustering algorithms and two algorithms that use correlation-based merit functions. The proposed algorithm outperforms the benchmark algorithms and finds patterns based on negative correlations. Although these patterns contain important relationship among genes, they are not found by most of biclustering algorithms. The experimental study also shows the importance of the size in a bicluster in addition to the value of its correlation. In particular, the size of a bicluster has an influence over its enrichment in a GO term.

[1]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  C. Angeletti,et al.  Bcl-2 protein: a prognostic factor inversely correlated to p53 in non-small-cell lung cancer. , 1995, British Journal of Cancer.

[3]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[4]  A. Sabichi,et al.  Inverse relationship between 15-lipoxygenase-2 and PPAR-gamma gene expression in normal epithelia compared with tumor epithelia. , 2005, Neoplasia.

[5]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[6]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[7]  Federico Divina,et al.  A multi-objective approach to discover biclusters in microarray data , 2007, GECCO '07.

[8]  Jinyan Li,et al.  Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways , 2009, Nucleic acids research.

[9]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[10]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Juan A. Nepomuceno,et al.  A local search in Scatter Search for improving Biclusters , 2011, 2011 Third World Congress on Nature and Biologically Inspired Computing.

[12]  Zhoujun Li,et al.  Biclustering of microarray data with MOSPO based on crowding distance , 2009, BMC Bioinformatics.

[13]  Achuthsankar S. Nair,et al.  Biclustering of gene expression data using reactive greedy randomized adaptive search procedure , 2009, BMC Bioinformatics.

[14]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[15]  Neelima Gupta,et al.  MIB: Using mutual information for biclustering gene expression data , 2010, Pattern Recognit..

[16]  Sushmita Mitra,et al.  Evolutionary biclustering of gene expressions , 2006, UBIQ.

[17]  Jin-Kao Hao,et al.  A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data , 2009, BioData Mining.

[18]  Robert M. Haralick,et al.  Mining Subspace Correlations , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[19]  Juan A. Nepomuceno,et al.  Biclustering of Gene Expression Data by Correlation-Based Scatter Search , 2011, BioData Mining.

[20]  P. Nelson,et al.  Theory of high-force DNA stretching and overstretching. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[22]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[23]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[24]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[25]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[26]  Roberto Therón,et al.  Methods to Bicluster Validation and Comparison in Microarray Data , 2007, IDEAL.

[27]  Hong Yan,et al.  Finding Correlated Biclusters from Gene Expression Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[28]  Padraig Cunningham,et al.  Biclustering of expression data using simulated annealing , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[29]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[30]  Anindya Bhattacharya,et al.  Bi-correlation clustering algorithm for determining a set of co-regulated genes , 2009, Bioinform..

[31]  Jessica Andrea Carballido,et al.  Microarray Biclustering: A Novel Memetic Approach Based on the PISA Platform , 2009, EvoBIO.

[32]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[33]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Amiya Kumar Rath,et al.  CPB: A Model for Biclustering , 2009, 2009 International Conference on Information Management and Engineering.

[35]  Rafael Martí,et al.  Scatter Search: Diseño Básico y Estrategias avanzadas , 2002, Inteligencia Artif..

[36]  Yasser M. Kadah,et al.  An automatic gene ontology software tool for bicluster and cluster comparisons , 2009, 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[37]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[38]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[39]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[40]  Feng Liu,et al.  Biclustering of Gene Expression Data Using EDA-GA Hybrid , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[41]  Jesús S. Aguilar-Ruiz,et al.  A biclustering algorithm for extracting bit-patterns from binary datasets , 2011, Bioinform..

[42]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[43]  Gwan-Su Yi,et al.  Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion , 2013, BMC Genomics.

[44]  Weixiong Zhang,et al.  Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases , 2008, Genome Biology.

[45]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[46]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[47]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[48]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[49]  Ümit V. Çatalyürek,et al.  Comparative analysis of biclustering algorithms , 2010, BCB '10.

[50]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[51]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[53]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[54]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[55]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[56]  Robert M. Haralick,et al.  Exploiting the Geometry of Gene Expression Patterns for Unsupervised Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[57]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[58]  Li Li,et al.  A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data , 2012, BioData Mining.