Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana

MOTIVATION Identifying regulatory modules is an important task in the exploratory analysis of gene expression time series data. Clustering algorithms are often used for this purpose. However, gene regulatory events may induce complex temporal features in a gene expression profile, including time delays, inversions and transient correlations, which are not well accounted for by current clustering methods. As the cost of microarray experiments continues to fall, the temporal resolution of time course studies is increasing. This has led to a need to take account of detailed temporal features of this kind. Thus, while standard clustering methods are both widely used and much studied, their shared shortcomings with respect to such temporal features motivates the work presented here. RESULTS Here, we introduce a temporal clustering approach for high-dimensional gene expression data which takes account of time delays, inversions and transient correlations. We do so by exploiting a recently introduced, message-passing-based algorithm called Affinity Propagation (AP). We take account of temporal features of interest following an approximate but efficient dynamic programming approach due to Qian et al. The resulting approach is demonstrably effective in its ability to discern non-obvious temporal features, yet efficient and robust enough for routine use as an exploratory tool. We show results on validated transcription factor-target pairs in yeast and on gene expression data from a study of Arabidopsis thaliana under pathogen infection. The latter reveals a number of biologically striking findings. AVAILABILITY Matlab code for our method is available at http://www.wsbc.warwick.ac.uk/stevenkiddle/tcap.html.

[1]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[3]  C. Sheridan,et al.  Kenyan dispute illuminates bioprospecting difficulties , 2004, Nature Biotechnology.

[4]  M. Höfte,et al.  Abscisic Acid Determines Basal Susceptibility of Tomato toBotrytis cinerea and Suppresses Salicylic Acid-Dependent Signaling Mechanisms1 , 2002, Plant Physiology.

[5]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[8]  David M. Lin,et al.  Effective similarity measures for expression profiles , 2006, Bioinform..

[9]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[10]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[11]  Anthony Hall,et al.  Disruption of Hepatic Leptin Signaling Protects Mice From Age- and Diet-Related Glucose Intolerance , 2010, Diabetes.

[12]  C. Pieterse,et al.  The AP2/ERF Domain Transcription Factor ORA59 Integrates Jasmonic Acid and Ethylene Signals in Plant Defense1[W] , 2008, Plant Physiology.

[13]  Synan F. AbuQamar,et al.  Expression profiling and mutant analysis reveals complex regulatory networks involved in Arabidopsis response to Botrytis infection. , 2006, The Plant journal : for cell and molecular biology.

[14]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Mark Craven,et al.  Clustered alignments of gene-expression time series data , 2009, Bioinform..

[17]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[18]  Michael F. Covington,et al.  Mechanical Stress Induces Biotic and Abiotic Stress Responses via a Novel cis-Element , 2007, PLoS genetics.

[19]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[20]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[21]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[22]  Arlindo L. Oliveira,et al.  A Linear Time Biclustering Algorithm for Time Series Gene Expression Data , 2005, WABI.

[23]  Yufei Huang,et al.  Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules , 2009, Bioinform..

[24]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[25]  Rodrigo A Gutiérrez,et al.  Systems Biology for the Virtual Plant1 , 2005, Plant Physiology.

[26]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[27]  Anbupalam Thalamuthu,et al.  Gene expression Evaluation and comparison of gene clustering methods in microarray analysis , 2006 .

[28]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[29]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[30]  Matthias Platzer,et al.  tuple_plot: Fast pairwise nucleotide sequence comparison with noise suppression , 2006, Bioinform..

[31]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[32]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[33]  A. Rethwilm,et al.  AZT resistance of simian foamy virus reverse transcriptase is based on the excision of AZTMP in the presence of ATP , 2007, Nucleic acids research.

[34]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[35]  Gregory Stephanopoulos,et al.  Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data. , 2004, Genome research.