Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana

MOTIVATION Identifying regulatory modules is an important task in the exploratory analysis of gene expression time series data. Clustering algorithms are often used for this purpose. However, gene regulatory events may induce complex temporal features in a gene expression profile, including time delays, inversions and transient correlations, which are not well accounted for by current clustering methods. As the cost of microarray experiments continues to fall, the temporal resolution of time course studies is increasing. This has led to a need to take account of detailed temporal features of this kind. Thus, while standard clustering methods are both widely used and much studied, their shared shortcomings with respect to such temporal features motivates the work presented here. RESULTS Here, we introduce a temporal clustering approach for high-dimensional gene expression data which takes account of time delays, inversions and transient correlations. We do so by exploiting a recently introduced, message-passing-based algorithm called Affinity Propagation (AP). We take account of temporal features of interest following an approximate but efficient dynamic programming approach due to Qian et al. The resulting approach is demonstrably effective in its ability to discern non-obvious temporal features, yet efficient and robust enough for routine use as an exploratory tool. We show results on validated transcription factor-target pairs in yeast and on gene expression data from a study of Arabidopsis thaliana under pathogen infection. The latter reveals a number of biologically striking findings. AVAILABILITY Matlab code for our method is available at http://www.wsbc.warwick.ac.uk/stevenkiddle/tcap.html.

[1]  Yufei Huang,et al.  Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules , 2009, Bioinform..

[2]  Mark Craven,et al.  Clustered alignments of gene-expression time series data , 2009, Bioinform..

[3]  C. Pieterse,et al.  The AP2/ERF Domain Transcription Factor ORA59 Integrates Jasmonic Acid and Ethylene Signals in Plant Defense1[W] , 2008, Plant Physiology.

[4]  A. Rethwilm,et al.  AZT resistance of simian foamy virus reverse transcriptase is based on the excision of AZTMP in the presence of ATP , 2007, Nucleic acids research.

[5]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[6]  Michael F. Covington,et al.  Mechanical Stress Induces Biotic and Abiotic Stress Responses via a Novel cis-Element , 2007, PLoS genetics.

[7]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[8]  L. Kozma-Bognár,et al.  Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana , 2006, Molecular systems biology.

[9]  Synan F. AbuQamar,et al.  Expression profiling and mutant analysis reveals complex regulatory networks involved in Arabidopsis response to Botrytis infection. , 2006, The Plant journal : for cell and molecular biology.

[10]  Matthias Platzer,et al.  tuple_plot: Fast pairwise nucleotide sequence comparison with noise suppression , 2006, Bioinform..

[11]  David M. Lin,et al.  Effective similarity measures for expression profiles , 2006, Bioinform..

[12]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Arlindo L. Oliveira,et al.  A Linear Time Biclustering Algorithm for Time Series Gene Expression Data , 2005, WABI.

[14]  Rodrigo A Gutiérrez,et al.  Systems Biology for the Virtual Plant1 , 2005, Plant Physiology.

[15]  C. Sheridan,et al.  Kenyan dispute illuminates bioprospecting difficulties , 2004, Nature Biotechnology.

[16]  Gregory Stephanopoulos,et al.  Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data. , 2004, Genome research.

[17]  R. Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[19]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[20]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[21]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[22]  M. Höfte,et al.  Abscisic Acid Determines Basal Susceptibility of Tomato toBotrytis cinerea and Suppresses Salicylic Acid-Dependent Signaling Mechanisms1 , 2002, Plant Physiology.

[23]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[24]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[25]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[26]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[27]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[28]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[30]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2001, Springer Series in Statistics.

[32]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[33]  S. Eddy Profile hidden Markov models , 1998, Bioinform..