Statistical significance analysis of longitudinal gene expression data

MOTIVATION Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data. RESULTS We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.

[1]  Darwin J. Prockop,et al.  Transplantability and therapeutic effects of bone marrow-derived mesenchymal cells in children with osteogenesis imperfecta , 1999, Nature Medicine.

[2]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[3]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[4]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[5]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[6]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[7]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[8]  David M. Rocke,et al.  Improved significance test for DNA microarray data: temporal effects of shear stress on endothelial genes. , 2002, Physiological genomics.

[9]  G. Karsenty,et al.  The osteoblast: a sophisticated fibroblast under central surveillance. , 2000, Science.

[10]  A. Hershko,et al.  Dominant-negative cyclin-selective ubiquitin carrier protein E2-C/UbcH10 blocks cells in metaphase. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Huilin Qi,et al.  Identification of genes responsible for osteoblast differentiation from human mesodermal progenitor cells , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[13]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[14]  Wei Pan,et al.  On the Use of Permutation in and the Performance of A Class of Nonparametric Methods to Detect Differential Gene Expression , 2003, Bioinform..

[15]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[16]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[17]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[18]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[19]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  C. Verfaillie,et al.  Purification and ex vivo expansion of postnatal human marrow mesodermal progenitor cells. , 2001, Blood.

[21]  T. Komori,et al.  Regulation of osteoblast differentiation mediated by bone morphogenetic proteins, hedgehogs, and Cbfa1. , 2000, Endocrine reviews.

[22]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[23]  Hongzhe Li,et al.  Statistical methods for analysis of time course gene expression data. , 2002, Frontiers in bioscience : a journal and virtual library.

[24]  J. Kent Robust properties of likelihood ratio tests , 1982 .

[25]  P. Diggle Analysis of Longitudinal Data , 1995 .

[26]  E. Lander Array of hope , 1999, Nature Genetics.

[27]  J. Olson,et al.  A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model. , 2002, Human molecular genetics.

[28]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .