Dynamic Clustering of Gene Expression

It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest.

[1]  Roy D. Wallen,et al.  The Illustrated Wavelet Transform Handbook , 2004 .

[2]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[3]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  Arlindo L. Oliveira,et al.  An Efficient Biclustering Algorithm for Finding Genes with Similar Patterns in Time-series Expression Data , 2007, APBC.

[5]  Hong Yan,et al.  Spectral analysis of microarray gene expression time series data of Plasmodium falciparum , 2008, Int. J. Bioinform. Res. Appl..

[6]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[7]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[8]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[9]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[10]  L. Collins,et al.  Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions. , 1988, Multivariate behavioral research.

[11]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[12]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[13]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[14]  Atul J. Butte,et al.  Comparing the Similarity of Time-Series Gene Expression Using Signal Processing Metrics , 2001, J. Biomed. Informatics.

[15]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[16]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[17]  Kian-Lee Tan,et al.  Identifying time-lagged gene clusters using gene expression data , 2005, Bioinform..

[18]  Ya Zhang,et al.  A time-series biclustering algorithm for revealing co-regulated genes , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[19]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[20]  Shuhei Kimura,et al.  Inferring cluster-based networks from differently stimulated multiple time-course gene expression data , 2010, Bioinform..

[21]  Tom Heskes,et al.  Gene regulation in the intraerythrocytic cycle of Plasmodium falciparum , 2009, Bioinform..

[22]  Andrea Vijverberg,et al.  Clustering Microarray Data , 2007 .

[23]  Xiaohui Liu,et al.  Optimal Search Space for Clustering Gene Expression Data via Consensus , 2007, J. Comput. Biol..

[24]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[25]  Cesare Furlanello,et al.  Combining feature selection and DTW for time-varying functional genomics , 2006, IEEE Transactions on Signal Processing.

[26]  Heikki Mannila,et al.  Randomization techniques for assessing the significance of gene periodicity results , 2011, BMC Bioinformatics.

[27]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[28]  Eileen Kraemer,et al.  PlasmoDB: a functional genomic database for malaria parasites , 2008, Nucleic Acids Res..

[29]  A. Sungoor,et al.  Comparative Analysis of Genomic Signal Processing for Microarray Data Clustering , 2011, IEEE Transactions on NanoBioscience.

[30]  R W Doerge,et al.  Adding Confidence to Gene Expression Clustering , 2005, Genetics.

[31]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[32]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[33]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[34]  A. Grossmann,et al.  Cycle-octave and related transforms in seismic signal analysis , 1984 .

[35]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[36]  U. Grenander,et al.  Probability and Statistics: The Harald Cramer Volume. , 1960 .

[37]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[38]  Bruno Torrésani,et al.  Practical Time-Frequency Analysis, Volume 9: Gabor and Wavelet Transforms, with an Implementation in S , 1998 .

[39]  Michael L. Bittner,et al.  Genomic Signal Processing: The Salient Issues , 2004, EURASIP J. Adv. Signal Process..

[40]  S. Qian Introduction to Time-Frequency and Wavelet Transforms , 2001 .

[41]  A. Brøndsted An Introduction to Convex Polytopes , 1982 .

[42]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[43]  Jorng-Tzong Horng,et al.  An expert system to identify co-regulated gene groups from time-lagged gene clusters using cell cycle expression data , 2010, Expert Syst. Appl..

[44]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[45]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[46]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[47]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[48]  Jiuzhou Z. Song,et al.  The Wavelet-Based Cluster Analysis for Temporal Gene Expression Data , 2007, EURASIP J. Bioinform. Syst. Biol..

[49]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[50]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[51]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.