TriGen: A genetic algorithm to mine triclusters in temporal gene expression data

Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. We present the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously. We have used TriGen to mine datasets related to synthetic data, yeast (Saccharomyces cerevisiae) cell cycle and human inflammation and host response to injury experiments. TriGen has proved to be capable of extracting groups of genes with similar patterns in subsets of conditions and times, and these groups have shown to be related in terms of their functional annotations extracted from the Gene Ontology.

[1]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[2]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Jorge Reyes,et al.  A Chilean seismic regionalization through a Kohonen neural network , 2010, Neural Computing and Applications.

[4]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[5]  Oscar Cordón,et al.  A Multiobjective Evolutionary Conceptual Clustering Methodology for Gene Annotation Within Structural Databases: A Case of Study on the Gene Ontology Database , 2008, IEEE Transactions on Evolutionary Computation.

[6]  Kelvin Sim,et al.  Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[8]  C MadeiraSara,et al.  Biclustering Algorithms for Biological Data Analysis , 2004 .

[9]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[10]  Yi Huang,et al.  Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm , 2012, BMC Bioinformatics.

[11]  Mohammed J. Zaki,et al.  The ParTriCluster Algorithm for Gene Expression Analysis , 2007, International Journal of Parallel Programming.

[12]  Alicia Troncoso Lora,et al.  An evolutionary algorithm to discover quantitative association rules in multidimensional time series , 2011, Soft Comput..

[13]  Shuigeng Zhou,et al.  gTRICLUSTER: A More General and Effective 3D Clustering Algorithm for Gene-Sample-Time Microarray Data , 2006, BioDM.

[14]  Bin Zhang,et al.  Mining Time-Shifting Co-regulation Patterns from Gene Expression Data , 2007, APWeb/WAIM.

[15]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[16]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[17]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[18]  Sushmita Mitra,et al.  Evolutionary biclustering of gene expressions , 2006, UBIQ.

[19]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[20]  Masahiro Okamoto,et al.  Novel technique for preprocessing high dimensional time-course data from DNA microarray: mathematical model-based clustering , 2006, Bioinform..

[21]  Li-Min Fu Microarray Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[22]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[23]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[24]  Jesús S. Aguilar-Ruiz,et al.  Pattern Recognition in Biological Time Series , 2011, CAEPIA.

[25]  Vincent S. Tseng,et al.  A novel method for mining temporally dependent association rules in three-dimensional microarray datasets , 2010, 2010 International Computer Symposium (ICS2010).

[26]  Pedro Mendes,et al.  GEPASI: a software package for modelling the dynamics, steady states and control of biochemical and other systems , 1993, Comput. Appl. Biosci..

[27]  Roland Somogyi,et al.  Genetic network inference , 2000 .

[28]  Zhen Hu,et al.  Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets , 2010, 2010 IEEE International Conference on Data Mining.

[29]  Roy P. Pargas,et al.  Test-Data Generation Using Genetic Algorithms , 1999, Softw. Test. Verification Reliab..

[30]  Jugal K. Kalita,et al.  Triclustering in gene expression data analysis: A selected survey , 2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science.

[31]  José Cristóbal Riquelme Santos,et al.  Revisiting the yeast cell cycle problem with the improved TriGen algorithm , 2011, 2011 Third World Congress on Nature and Biologically Inspired Computing.

[32]  Rocío Romero-Záliz,et al.  Classification of Gene Expression Profiles: Comparison of K-means and Expectation Maximization Algorithms , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[33]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[34]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[35]  Christodoulos A. Floudas,et al.  Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures , 2008, BMC Bioinformatics.

[36]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[37]  K. Tan,et al.  Finding Time-Lagged 3D Clusters , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[38]  Rocío Romero-Záliz,et al.  Onto-CC: a web server for identifying Gene Ontology conceptual clusters , 2008, Nucleic Acids Res..

[39]  Zhoujun Li,et al.  Multi-objective evolutionary algorithm for mining 3D clusters in gene-sample-time microarray data , 2008, 2008 IEEE International Conference on Granular Computing.

[40]  Guoren Wang,et al.  Efficiently Mining Time-Delayed Gene Expression Patterns , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[42]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[43]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[44]  Martino Barenco,et al.  Correction of scaling mismatches in oligonucleotide microarray data , 2006, BMC Bioinformatics.

[45]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Cristina Rubio-Escudero Fusion of knowledge towards the identification of genetic profiles , 2012, AI Commun..