A temporal precedence based clustering method for gene expression microarray data

BackgroundTime-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not.ResultsA gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system.ConclusionsOur experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Kian-Lee Tan,et al.  Identifying time-lagged gene clusters using gene expression data , 2005, Bioinform..

[3]  W. Pan,et al.  Model-based cluster analysis of microarray gene-expression data , 2002, Genome Biology.

[4]  Holger H. Hoos,et al.  Inference of transcriptional regulation relationships from gene expression data , 2003, SAC '03.

[5]  Jianfeng Feng,et al.  Listen to Genes: Dealing with Microarray Data in the Frequency Domain , 2009, PloS one.

[6]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[10]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[11]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[12]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[13]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[14]  Zhaohui S. Qin,et al.  Statistical resynchronization and Bayesian detection of periodically expressed genes. , 2004, Nucleic acids research.

[15]  Chang-Tsun Li,et al.  Partial mixture model for tight clustering of gene expression time-course , 2007, BMC Bioinformatics.

[16]  I. Androulakis,et al.  Analysis of time-series gene expression data: methods, challenges, and opportunities. , 2007, Annual review of biomedical engineering.

[17]  Albert-László Barabási,et al.  Linked: The New Science of Networks , 2002 .

[18]  M. Dehmer,et al.  Analysis of Microarray Data: A Network-Based Approach , 2008 .

[19]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[20]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[21]  M. Dehmer,et al.  Comprar Analysis of Microarray Data: A Network-Based Approach | Matthias Dehmer | 9783527318223 | Wiley , 2008 .

[22]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[23]  S. Kay,et al.  Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. , 2000, Science.

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Steven Skiena,et al.  Identifying gene regulatory networks from experimental data , 2001, Parallel Comput..

[26]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[29]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2007, Genetical research.

[30]  Sui Huang,et al.  Gene Expression Dynamics Inspector (GEDI): for integrative analysis of expression profiles , 2003, Bioinform..

[31]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[32]  Vasyl Pihur,et al.  Reconstruction of genetic association networks from microarray data: a partial least squares approach , 2008, Bioinform..

[33]  Marianna Pensky,et al.  BATS: a Bayesian user-friendly software for Analyzing Time Series microarray experiments , 2008, BMC Bioinformatics.

[34]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Holger H. Hoos,et al.  Inference of Transcriptional Regulation Relationships from Gene Expression Data , 2003, Bioinform..

[36]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[37]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[38]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[39]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[40]  Kwang-Hyun Cho,et al.  Microarray data clustering based on temporal variation: FCV with TSD preclustering. , 2003, Applied bioinformatics.

[41]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[42]  Radhakrishnan Nagarajan,et al.  Comment on causality and pathway search in microarray time series experiment , 2008, Bioinform..

[43]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[44]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[45]  Jianfeng Feng,et al.  Uncovering Interactions in the Frequency Domain , 2008, PLoS Comput. Biol..

[46]  Rongling Wu,et al.  Clustering Periodic Patterns of Gene Expression Based on Fourier Approximations , 2006 .

[47]  Ritesh Krishna,et al.  A Partial Granger Causality Approach to Explore Causal Networks Derived From Multi-parameter Data , 2008, CMSB.

[48]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[49]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[50]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Daniele Marinazzo,et al.  Radial basis function approach to nonlinear Granger causality of time series. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  C. Granger,et al.  Forecasting Economic Time Series. , 1988 .

[53]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[54]  G. William Schwert,et al.  Tests of causality: The message in the innovations , 1979 .

[55]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[56]  Maximino Aldana-Gonzalez,et al.  Linked: The New Science of Networks , 2003 .

[57]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[58]  Ivan G. Costa,et al.  Analyzing gene expression time-courses , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[59]  Hong Yan,et al.  Dominant spectral component analysis for transcriptional regulations using microarray time-series data , 2004, Bioinform..

[60]  Snigdhansu Chatterjee,et al.  Causality and pathway search in microarray time series experiment , 2007, Bioinform..

[61]  Vicky Buchanan-Wollaston,et al.  Overexpression of a chromatin architecture-controlling AT-hook protein extends leaf longevity and increases the post-harvest storage life of plants. , 2007, The Plant journal : for cell and molecular biology.

[62]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[63]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[64]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[65]  Daniele Marinazzo,et al.  Nonlinear parametric model for Granger causality of time series , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[66]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[67]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..