A method to identify differential expression profiles of time-course gene data with Fourier transformation

BackgroundTime course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis.ResultsThis work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles.ConclusionsUsing this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[3]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[4]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[5]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[6]  M. Cugmas,et al.  On comparing partitions , 2015 .

[7]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[8]  Marianna Pensky,et al.  BATS: a Bayesian user-friendly software for Analyzing Time Series microarray experiments , 2008, BMC Bioinformatics.

[9]  P. Srivastava,et al.  Heat‐Shock Proteins , 2003, Current protocols in immunology.

[10]  Anders Berglund,et al.  A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription , 2003, Bioinform..

[11]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[12]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[13]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[14]  Rongling Wu,et al.  Clustering Periodic Patterns of Gene Expression Based on Fourier Approximations , 2006 .

[15]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[16]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[17]  Adrian E. Raftery,et al.  Fitting straight lines to point patterns , 1984, Pattern Recognit..

[18]  David A. Freedman,et al.  The Empirical Distribution of Fourier Coefficients , 1980 .

[19]  Paul D. Minton,et al.  Statistics: The Exploration and Analysis of Data , 2002, Technometrics.

[20]  Susmita Datta,et al.  Empirical Bayes screening of many p-values with applications to microarray studies , 2005, Bioinform..

[21]  Haseong Kim,et al.  Clustering of change patterns using Fourier coefficients , 2008, Bioinform..

[22]  Paul D. W. Kirk,et al.  Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data , 2009, Bioinform..

[23]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[24]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[25]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[26]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[28]  L. Wasserman,et al.  CATS , 2005 .

[29]  J. Hart,et al.  Tests for Change in a Mean Function when the Data are Dependent , 1998 .

[30]  A mutation in the yeast heat-shock factor gene causes temperature-sensitive defects in both mitochondrial protein import and the cell cycle. , 1991, Molecular and cellular biology.

[31]  Karuturi R. Krishna Murthy,et al.  Improved Fourier transform method for unsupervised cell-cycle regulated gene prediction , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[32]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[33]  Ming Yuan,et al.  Flexible temporal expression profile modelling using the Gaussian process , 2006, Comput. Stat. Data Anal..

[34]  Neil D. Lawrence,et al.  A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression , 2011, BMC Bioinformatics.

[35]  Susmita Datta,et al.  An empirical bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments , 2004, Bioinform..

[36]  K. Berk A Central Limit Theorem for $m$-Dependent Random Variables with Unbounded $m$ , 1973 .

[37]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[38]  R. L. Eubank,et al.  Testing Goodness-of-Fit in Regression Via Order Selection Criteria , 1992 .