More powerful significant testing for time course gene expression data using functional principal component analysis approaches

BackgroundOne of the fundamental problems in time course gene expression data analysis is to identify genes associated with a biological process or a particular stimulus of interest, like a treatment or virus infection. Most of the existing methods for this problem are designed for data with longitudinal replicates. But in reality, many time course gene experiments have no replicates or only have a small number of independent replicates.ResultsWe focus on the case without replicates and propose a new method for identifying differentially expressed genes by incorporating the functional principal component analysis (FPCA) into a hypothesis testing framework. The data-driven eigenfunctions allow a flexible and parsimonious representation of time course gene expression trajectories, leaving more degrees of freedom for the inference compared to that using a prespecified basis. Moreover, the information of all genes is borrowed for individual gene inferences.ConclusionThe proposed approach turns out to be more powerful in identifying time course differentially expressed genes compared to the existing methods. The improved performance is demonstrated through simulation studies and a real data application to the Saccharomyces cerevisiae cell cycle data.

[1]  J. Olson,et al.  A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model. , 2002, Human molecular genetics.

[2]  Wenguang Sun,et al.  Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments , 2011 .

[3]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[4]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[5]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[6]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[7]  Joshua E. S. Socolar,et al.  Global control of cell-cycle transcription by coupled CDK and network oscillators , 2008, Nature.

[8]  F Hong,et al.  Functional hierarchical models for identifying genes with different time-course expression profiles. , 2006, Biometrics.

[9]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  William Alexander,et al.  Nonparametric Smoothing and Lack-of-Fit Tests , 1999, Technometrics.

[12]  Xing Qiu,et al.  Detecting intergene correlation changes in microarray analysis: a new approach to gene selection , 2009, BMC Bioinformatics.

[13]  Jun S. Liu,et al.  Identifying Differentially Expressed Genes in Time Course Microarray Data , 2009 .

[14]  Xing Qiu,et al.  A new gene selection procedure based on the covariance distance , 2010, Bioinform..

[15]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Lucia Altucci,et al.  A genomic view of estrogen actions in human breast cancer cells by expression profiling of the hormone-responsive transcriptome. , 2004, Journal of molecular endocrinology.

[17]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[18]  Christina Kendziorski,et al.  Hidden Markov Models for Microarray Time Course Data in Multiple Biological Conditions , 2006 .

[19]  Julian J. Faraway,et al.  An F test for linear models with functional responses , 2004 .

[20]  John Hinde,et al.  Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review , 2011 .

[21]  Taesung Park,et al.  Statistical tests for identifying differentially expressed genes in time-course microarray experiments , 2003, Bioinform..

[22]  Scott L. Zeger,et al.  The Analysis of Gene Expression Data: Methods and Software , 2013 .

[23]  Andrei Yakovlev,et al.  Diverse correlation structures in gene expression data and their utility in improving statistical inference , 2007, 0712.2130.

[24]  Coffey Norma,et al.  Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review , 2011 .

[25]  Jane-Ling Wang,et al.  Identifying Differentially Expressed Genes for Time-course Microarray Data through Functional Data Analysis , 2010 .

[26]  Jerzy Zabczyk,et al.  Topics in stochastic processes , 2013 .

[27]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[28]  Mark C K Yang,et al.  Identifying temporally differentially expressed genes through functional principal components analysis. , 2009, Biostatistics.

[29]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[30]  Xu Han,et al.  Identifying differentially expressed genes in Time-Course microarray Experiment without Replicate , 2007, J. Bioinform. Comput. Biol..

[31]  John D. Storey,et al.  SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays , 2003 .

[32]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  T. Jaakkola,et al.  Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Insuk Sohn,et al.  A permutation-based multiple testing method for time-course microarray experiments , 2009, BMC Bioinformatics.