论文信息 - Discovering pan-correlation patterns from time course data sets by efficient mining algorithms

Discovering pan-correlation patterns from time course data sets by efficient mining algorithms

Time-course correlation patterns can be positive or negative, and time-lagged with gaps. Mining all these correlation patterns help to gain broad insights on variable dependencies. Here, we prove that diverse types of correlation patterns can be represented by a generalized form of positive correlation patterns. We prove a correspondence between positive correlation patterns and sequential patterns, and present an efficient single-scan algorithm for mining the correlations. Evaluations on synthetic time course data sets, and yeast cell cycle gene expression data sets indicate that: (1) the algorithm has linear time increment in terms of increasing number of variables; (2) negative correlation patterns are abundant in real-world data sets; and (3) correlation patterns with time lags and gaps are abundant. Existing methods have only discovered incomplete forms of many of these patterns, and have missed some important patterns completely.

[1] Michael Ruogu Zhang,et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2] Jiawei Han,et al. BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[3] G. Getz,et al. Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4] Arlindo L. Oliveira,et al. A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[5] Ronald W. Davis,et al. A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[6] Jian Pei,et al. Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[7] Kian-Lee Tan,et al. Mining gene expression data for positive and negative co-regulated gene clusters , 2004, Bioinform..

[8] Li Li,et al. BMC Bioinformatics Methodology article Discovery of time-delayed gene regulatory networks based on temporal , 2006 .

[9] Kian-Lee Tan,et al. Identifying time-lagged gene clusters using gene expression data , 2005, Bioinform..

[10] Jan Van den Bussche,et al. Finding Clusters of Positive and Negative Coregulated Genes in Gene Expression Data , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[11] Grace S. Shieh,et al. A pattern recognition approach to infer time-lagged genetic interactions , 2008, Bioinform..

[12] Guoren Wang,et al. Mining Time-Delayed Coherent Patterns in Time Series Gene Expression Data , 2006, ADMA.

[13] Ge Yu,et al. Maximal Subspace Coregulated Gene Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[14] Jinyan Li,et al. Negative correlations in collaboration: concepts and algorithms , 2010, KDD '10.

[15] Jinyan Li,et al. Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways , 2009, Nucleic acids research.

[16] Hans-Hermann Bock,et al. Two-mode clustering methods: astructuredoverview , 2004, Statistical methods in medical research.

[17] Jugal K. Kalita,et al. CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data , 2013, Pattern Recognit. Lett..

[18] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[19] Arlindo L. Oliveira,et al. Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.