Discovering pan-correlation patterns from time course data sets by efficient mining algorithms

Time-course correlation patterns can be positive or negative, and time-lagged with gaps. Mining all these correlation patterns help to gain broad insights on variable dependencies. Here, we prove that diverse types of correlation patterns can be represented by a generalized form of positive correlation patterns. We prove a correspondence between positive correlation patterns and sequential patterns, and present an efficient single-scan algorithm for mining the correlations. Evaluations on synthetic time course data sets, and yeast cell cycle gene expression data sets indicate that: (1) the algorithm has linear time increment in terms of increasing number of variables; (2) negative correlation patterns are abundant in real-world data sets; and (3) correlation patterns with time lags and gaps are abundant. Existing methods have only discovered incomplete forms of many of these patterns, and have missed some important patterns completely.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[3]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[5]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[6]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[7]  Kian-Lee Tan,et al.  Mining gene expression data for positive and negative co-regulated gene clusters , 2004, Bioinform..

[8]  Li Li,et al.  BMC Bioinformatics Methodology article Discovery of time-delayed gene regulatory networks based on temporal , 2006 .

[9]  Kian-Lee Tan,et al.  Identifying time-lagged gene clusters using gene expression data , 2005, Bioinform..

[10]  Jan Van den Bussche,et al.  Finding Clusters of Positive and Negative Coregulated Genes in Gene Expression Data , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[11]  Grace S. Shieh,et al.  A pattern recognition approach to infer time-lagged genetic interactions , 2008, Bioinform..

[12]  Guoren Wang,et al.  Mining Time-Delayed Coherent Patterns in Time Series Gene Expression Data , 2006, ADMA.

[13]  Ge Yu,et al.  Maximal Subspace Coregulated Gene Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jinyan Li,et al.  Negative correlations in collaboration: concepts and algorithms , 2010, KDD '10.

[15]  Jinyan Li,et al.  Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways , 2009, Nucleic acids research.

[16]  Hans-Hermann Bock,et al.  Two-mode clustering methods: astructuredoverview , 2004, Statistical methods in medical research.

[17]  Jugal K. Kalita,et al.  CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data , 2013, Pattern Recognit. Lett..

[18]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[19]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.