Analysis of time-series gene expression data: methods, challenges, and opportunities.

Monitoring the change in expression patterns over time provides the distinct possibility of unraveling the mechanistic drivers characterizing cellular responses. Gene arrays measuring the level of mRNA expression of thousands of genes simultaneously provide a method of high-throughput data collection necessary for obtaining the scope of data required for understanding the complexities of living organisms. Unraveling the coherent complex structures of transcriptional dynamics is the goal of a large family of computational methods aiming at upgrading the information content of time-course gene expression data. In this review, we summarize the qualitative characteristics of these approaches, discuss the main challenges that this type of complex data present, and, finally, explore the opportunities in the context of developing mechanistic models of cellular response.

[1]  Anbupalam Thalamuthu,et al.  Gene expression Evaluation and comparison of gene clustering methods in microarray analysis , 2006 .

[2]  M. Ko,et al.  Expression profiling of the mouse early embryo: Reflections and perspectives , 2006, Developmental dynamics : an official publication of the American Association of Anatomists.

[3]  J. Hoheisel,et al.  Expression profiling of glial genes during Drosophila embryogenesis. , 2006, Developmental biology.

[4]  Ning Sun,et al.  Bayesian error analysis model for reconstructing transcriptional regulatory networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Wei Pan,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006, Bioinform..

[6]  Wei Pan,et al.  Bioinformatics Original Paper Incorporating Gene Functions as Priors in Model-based Clustering of Microarray Gene Expression Data , 2022 .

[7]  D. Stephens,et al.  A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes , 2006 .

[8]  Qiong Gao,et al.  pSTIING: a ‘systems’ approach towards integrating signalling pathways, interaction and transcriptional regulatory networks in inflammation and cancer , 2005, Nucleic Acids Res..

[9]  Ioannis P. Androulakis,et al.  An integrative systems biology approach for analyzing liver hypermetabolism , 2006 .

[10]  E.H. Yang,et al.  Assessing the Information Content of Short Time Series Expression Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[12]  Masaru Tomita,et al.  A microarray data-based semi-kinetic method for predicting quantitative dynamics of genetic networks , 2005, BMC Bioinformatics.

[13]  Anthony Stefanidis,et al.  Reconstructing spatiotemporal trajectories from sparse data , 2005 .

[14]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[15]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  James C Liao,et al.  A Global Regulatory Role of Gluconeogenic Genes in Escherichia coli Revealed by Transcriptome Network Analysis* , 2005, Journal of Biological Chemistry.

[17]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[18]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Fang-Xiang Wu,et al.  Dynamic Model-based Clustering for Time-course Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[21]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[22]  Jean-Gabriel Ganascia,et al.  Default Clustering from Sparse Data Sets , 2005, ECSQARU.

[23]  Ivan G. Costa,et al.  Analyzing gene expression time-courses , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Debra C DuBois,et al.  Corticosteroid-regulated genes in rat kidney: mining time series array data. , 2005, American journal of physiology. Endocrinology and metabolism.

[25]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.

[26]  Francisco Azuaje,et al.  A knowledge-driven approach to cluster validity assessment , 2005, Bioinform..

[27]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[28]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[29]  Bryan Frank,et al.  Independence and reproducibility across microarray platforms , 2005, Nature Methods.

[30]  Gabriel S. Eichler,et al.  Cell fates as high-dimensional attractor states of a complex gene regulatory network. , 2005, Physical review letters.

[31]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[32]  Mehmet Toner,et al.  Application of genome-wide expression analysis to human health and disease. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  A. Aljada,et al.  Contemporary Reviews in Cardiovascular Medicine Metabolic Syndrome A Comprehensive Perspective Based on Interactions Between Obesity, Diabetes, and Inflammation , 2022 .

[34]  Katy C. Kao,et al.  gNCA: a framework for determining transcription factor activity based on transcriptome: identifiability and numerical implementation. , 2005, Metabolic engineering.

[35]  Arul Jayaraman,et al.  Evaluation of an in vitro model of hepatic inflammatory response by gene expression profiling. , 2005, Tissue engineering.

[36]  Varun Garg,et al.  Comparison of four basic models of indirect pharmacodynamic responses , 1993, Journal of Pharmacokinetics and Biopharmaceutics.

[37]  Jaakko Astola,et al.  Clustering Time Series Gene Expression Data Based on Sum-of-Exponentials Fitting , 2005, EURASIP J. Adv. Signal Process..

[38]  Francesco Camastra,et al.  A novel kernel method for clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Alan M. Moses,et al.  MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model , 2004, Genome Biology.

[40]  Gene H Golub,et al.  Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Sanjay Mehrotra,et al.  A model-based optimization framework for the inference on gene regulatory networks from DNA array data , 2004, Bioinform..

[42]  Debra C DuBois,et al.  The genomic response of skeletal muscle to methylprednisolone using microarrays: tailoring data mining to the structure of the pharmacogenomic time series. , 2004, Pharmacogenomics.

[43]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[44]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[45]  Michael Q. Zhang,et al.  Identifying combinatorial regulation of transcription factors and binding motifs , 2004, Genome Biology.

[46]  Jean-Gabriel Ganascia,et al.  Clustering of Conceptual Graphs with Sparse Data , 2004, ICCS.

[47]  A. Jayaraman,et al.  Dynamic gene expression profiling using a microfabricated living cell array. , 2004, Analytical chemistry.

[48]  Yu Qian,et al.  Constraint-Based Graph Clustering through Node Sequencing and Partitioning , 2004, PAKDD.

[49]  Javed Khan,et al.  Diagnostic Classification of Cancer Using DNA Microarrays and Artificial Intelligence , 2004, Annals of the New York Academy of Sciences.

[50]  Ryszard Maleszka,et al.  Microarray reality checks in the context of a complex disease , 2004, Nature Biotechnology.

[51]  Katy C. Kao,et al.  Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Fang-Xiang Wu,et al.  Modeling Gene Expression from Microarray Expression Data with State-Space Equations , 2003, Pacific Symposium on Biocomputing.

[53]  Martin Straume,et al.  DNA Microarray Time Series Analysis: Automated Statistical Assessment of Circadian Rhythms in Gene Expression Patterning , 2004, Numerical Computer Methods, Part D.

[54]  Dimitrios Gunopulos,et al.  Indexing Multidimensional Time-Series , 2004, The VLDB Journal.

[55]  Alexander Schliep,et al.  Robust inference of groups in gene expression time-courses using mixtures of HMMs , 2004, ISMB/ECCB.

[56]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[57]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[58]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[59]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[60]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Sui Huang,et al.  Gene Expression Dynamics Inspector (GEDI): for integrative analysis of expression profiles , 2003, Bioinform..

[62]  Shusaku Tsumoto,et al.  Empirical Evaluation of Dissimilarity Measures for Time-Series Multiscale Matching , 2003, ISMIS.

[63]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[64]  M. Gerstein,et al.  Comparing protein abundance and mRNA expression levels on a genomic scale , 2003, Genome Biology.

[65]  Tanveer F. Syeda-Mahmood,et al.  Clustering time-varying gene expression profiles using scale-space signals , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[66]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[67]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[68]  Xinglai Ji,et al.  Mining gene expression data using a novel approach based on hidden Markov models , 2003, FEBS letters.

[69]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[70]  Harmen J. Bussemaker,et al.  Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[71]  David Botstein,et al.  A systematic approach to reconstructing transcription networks in Saccharomyces cerevisiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Peter J. Park,et al.  Comparing expression profiles of genes with similar promoter regions , 2002, Bioinform..

[73]  J. Darnell Transcription factors as targets for cancer therapy , 2002, Nature Reviews Cancer.

[74]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[75]  J. Darnell,et al.  Signalling: STATs: transcriptional control and biological impact , 2002, Nature Reviews Molecular Cell Biology.

[76]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[77]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[78]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Fotis C Kafatos A revolutionary landscape: the restructuring of biology and its convergence with medicine. , 2002, Journal of molecular biology.

[80]  Tin Kam Ho,et al.  A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors , 2002, Pattern Analysis & Applications.

[81]  Jerry Lanfear Dealing with the data deluge , 2002, Nature Reviews Drug Discovery.

[82]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[84]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[85]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[87]  M. Heller DNA microarray technology: devices, systems, and applications. , 2002, Annual review of biomedical engineering.

[88]  W. Pan,et al.  Model-based cluster analysis of microarray gene-expression data , 2002, Genome Biology.

[89]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[90]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[91]  S. Claeyssens,et al.  Gene transcription in hepatocytes during the acute phase of a systemic inflammation: from transcription factors to target genes , 2001, Inflammation Research.

[92]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[93]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[94]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[95]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[96]  Edward R. Dougherty,et al.  Small Sample Issues for Microarray-Based Classification , 2001, Comparative and functional genomics.

[97]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[98]  Anthony K. H. Tung,et al.  Constraint-based clustering in large databases , 2001, ICDT.

[99]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[100]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[101]  Robert P. W. Duin,et al.  Classifiers in almost empty spaces , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[102]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[103]  Ming Li,et al.  Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.

[104]  C. Sweep,et al.  In Zucker Diabetic Fatty Rats Plasma Leptin Levels are Correlated with Plasma Insulin Levels rather than with Body Weight , 1999, Hormone and metabolic research = Hormon- und Stoffwechselforschung = Hormones et metabolisme.

[105]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[106]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[107]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[108]  D. Bowtell,et al.  Options available—from start to finish—for obtaining expression data by microarray , 1999, Nature Genetics.

[109]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[110]  M. Morley,et al.  Making and reading microarrays , 1999, Nature Genetics.

[111]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[112]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[113]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[114]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[115]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[116]  Savageau Ma,et al.  A theory of alternative designs for biochemical control systems. , 1985 .

[117]  M A Savageau,et al.  A theory of alternative designs for biochemical control systems. , 1985, Biomedica biochimica acta.

[118]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[119]  H. Akaike A new look at the statistical model identification , 1974 .

[120]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .