A New Biclustering Algorithm for Time-Series Gene Expression Data Analysis

Biclustering algorithm is used to find local patterns as an important tool in the analysis of gene expression data. However, most of the biclusters found by existing biclustering algorithms consist of non-continuous columns. It is not suitable for time series gene expression data, which has not been extensively studied. This paper presents an efficient exact algorithm to search contiguous column coherent evolution biclusters in time-series data. The first step of the algorithm is to transform the original matrix into the difference matrix, then starting from the column pattern consisting of continuous k columns, gradually obtain longer patterns composed of more columns by using the prefix tree and nodes-update-strategy to improve the efficiency of the algorithm. Experimental results on real data show that the algorithm can find biclusters with statistically significance and strong biological relevance.

[1]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[2]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Nur Shazila Mohamed,et al.  Evaluating the performance of partitioning techniques for gene network inference , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[4]  Joana P Gonçalves,et al.  BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data , 2009, BMC Research Notes.

[5]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[6]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[8]  Ya Zhang,et al.  A time-series biclustering algorithm for revealing co-regulated genes , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[9]  F. Middleton,et al.  Hierarchical clustering of gene expression patterns in the Eomes + lineage of excitatory neurons during early neocortical development , 2012, BMC Neuroscience.

[10]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[11]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[12]  Yasser M Kadah,et al.  Construction of gene regulatory networks using biclustering and bayesian networks , 2011, Theoretical Biology and Medical Modelling.

[13]  Feng Liu,et al.  Biclustering of time-lagged gene expression data using real number , 2010 .

[14]  Arlindo L. Oliveira,et al.  A Linear Time Biclustering Algorithm for Time Series Gene Expression Data , 2005, WABI.

[15]  Michele Ceccarelli,et al.  articleTimeDelay-ARACNE : Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010 .

[16]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[17]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[18]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[19]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[20]  Burak Eksioglu,et al.  Clustering of high throughput gene expression data , 2012, Comput. Oper. Res..