Statistical methods for analysis of time course gene expression data.

Since many biological systems or regulatory networks are dynamic systems, gene expression levels measured over different time points during a given biological process can often provide more insights about the underlying system. These gene expression data measured over time are often called the time-course gene expression data. One unique feature of such data is the time dependency of the gene expression levels for a given gene at different times or between two different genes. Statistical analysis needs to account for such dependency in order to make valid inferences. This paper presents several statistical methods for analyzing such time-course gene expression data, including the time-lagged correlation coefficient for analyzing the relationship between genes, a mixed-effects model with splines for clustering genes and for estimating missing gene expression data, and a new method for aligning gene expression profiles obtained under two experimental conditions and for identifying gene clusters that show significant changes between two experimental conditions. We used the yeast cell cycle gene expression data sets to illustrate these methods and obtained the biologically meaningful conclusions from these analyses.