Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data

Longitudinal data is becoming increasingly common and various methods have been developed to analyze this type of data. Profiles from time-course gene expression studies, where cluster analysis plays an important role to identify groups of co-expressed genes over time, are investigated. A number of procedures have been used to cluster time-course gene expression data, however there are many limitations to the techniques previously described. An alternative approach is proposed, which aims to alleviate some of these limitations. The method exploits the connection between the linear mixed effects model and P-spline smoothing to simultaneously smooth the gene expression data to remove any measurement error/noise and cluster the expression profiles using finite mixtures of mixed effects models. This approach has a number of advantages, including decreased computation time and ease of implementation in standard software packages.

[1]  D. Botstein,et al.  Survival of starving yeast is correlated with oxidative stress response and nonrespiratory mitochondrial function , 2011, Proceedings of the National Academy of Sciences.

[2]  G. Celeux,et al.  Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments , 2005 .

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[5]  Age K. Smilde,et al.  Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA , 2007, Bioinform..

[6]  Chuan Zhou,et al.  Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions , 2003 .

[7]  D. Hand,et al.  Finding Groups in Gene Expression Data , 2005, Journal of biomedicine & biotechnology.

[8]  Merja Penttilä,et al.  Transcriptional responses of Saccharomyces cerevisiae to shift from respiratory and respirofermentative to fully fermentative metabolism. , 2011, Omics : a journal of integrative biology.

[9]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[10]  Christian Hennig,et al.  Methods for merging Gaussian mixture components , 2010, Adv. Data Anal. Classif..

[11]  Haseong Kim,et al.  Clustering of change patterns using Fourier coefficients , 2008, Bioinform..

[12]  Jianqing Fan,et al.  A Computational Approach to the Functional Clustering of Periodic Gene-Expression Profiles , 2008, Genetics.

[13]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[14]  Alexander Schliep,et al.  Analyzing Gene Expression Time-Courses , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[15]  Xin Chen,et al.  Curve-Based Clustering of Time Course Gene Expression Data Using Self-Organizing Maps , 2009, J. Bioinform. Comput. Biol..

[16]  Jun S. Liu,et al.  Rejection Control and Sequential Importance Sampling , 1998 .

[17]  Wenxuan Zhong,et al.  A data-driven clustering method for time course gene expression data , 2006, Nucleic acids research.

[18]  G. Wahba Spline models for observational data , 1990 .

[19]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[20]  T. Tarpey Linear Transformations and the k-Means Clustering Algorithm , 2007, American Statistician.

[21]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Matthias E. Futschik,et al.  Noise-robust Soft Clustering of Gene Expression Time-course Data , 2005, J. Bioinform. Comput. Biol..

[23]  L. Hubert,et al.  Comparing partitions , 1985 .

[24]  Yufei Huang,et al.  Clustering of Gene Expression Data Based on Shape Similarity , 2007, BIOCOMP.

[25]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[26]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[27]  D. Ruppert Selecting the Number of Knots for Penalized Splines , 2002 .

[28]  Yang Feng,et al.  Bayesian Functional Data Clustering for Temporal Microarray Data , 2008, International journal of plant genomics.

[29]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[31]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[32]  L. Qin,et al.  The Clustering of Regression Models Method with Applications in Gene Expression Data , 2006, Biometrics.

[33]  M. J. Hickman,et al.  The Hog1 Mitogen-Activated Protein Kinase Mediates a Hypoxic Response in Saccharomyces cerevisiae , 2011, Genetics.

[34]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[35]  S. Ng,et al.  Robust Cluster Analysis via Mixture Models , 2006 .

[36]  H. V. van Vuuren,et al.  Functional analyses of PAU genes in Saccharomyces cerevisiae. , 2009, Microbiology.

[37]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[38]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[39]  Philippe Besse,et al.  Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives , 2007, EURASIP J. Bioinform. Syst. Biol..

[40]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[41]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[42]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[43]  Naama Barkai,et al.  Strategy of Transcription Regulation in the Budding Yeast , 2007, PloS one.

[44]  Wenxuan Zhong,et al.  Penalized Clustering of Large-Scale Functional Data With Multiple Covariates , 2008, 0801.2555.

[45]  SchliepAlexander,et al.  Analyzing Gene Expression Time-Courses , 2005 .

[46]  Gilles Celeux,et al.  Combining Mixture Components for Clustering , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[47]  Ho-Jin Lee,et al.  Clustering of time-course gene expression data using functional data analysis , 2007, Comput. Biol. Chem..

[48]  Hans-Georg Müller,et al.  Classification using functional data analysis for temporal gene expression data , 2006, Bioinform..

[49]  Roger E Bumgarner,et al.  Clustering gene-expression data with repeated measurements , 2003, Genome Biology.

[50]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Anbupalam Thalamuthu,et al.  Gene expression Evaluation and comparison of gene clustering methods in microarray analysis , 2006 .

[52]  R. Schneiter,et al.  A genomewide screen reveals a role of mitochondria in anaerobic uptake of sterols in yeast. , 2005, Molecular biology of the cell.

[53]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[54]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[56]  K. Kwast,et al.  Metabolic-State-Dependent Remodeling of the Transcriptome in Response to Anoxia and Subsequent Reoxygenation in Saccharomyces cerevisiae , 2006, Eukaryotic Cell.

[57]  Sadanori Konishi,et al.  Functional Cluster Analysis via Orthonormalized Gaussian Basis Expansions and Its Application , 2010, J. Classif..

[58]  K SmildeAge,et al.  Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA , 2007 .

[59]  Christian Hennig,et al.  Identifiablity of Models for Clusterwise Linear Regression , 2000, J. Classif..