Gene time series data clustering based on continuous representations and an energy based similarity measure

Gene temporal expression data clustering has been widely used to study dynamic biological systems. However, most temporal gene expression data often contain noise, missing data points, and non-uniformly sampled time points, which imposes challenges for traditional clustering methods of extracting meaningful information. To improve the clustering performance, we introduce a novel clustering approach based on the continuous representations and an energy based similarity measure. The proposed approach models each gene expression profile as a B-spline expansion, for which the spline coefficients are estimated by regularized least squares scheme on the observed data. After fitting the continuous representations of gene expression profiles, we use an energy based similarity measure to take into account the temporal information and the relative changes of time series. Experimental results show that the proposed method is robust to noise and can produce meaningful clustering results.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[4]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  C. S. Möller-Leveta,et al.  Clustering of unevenly sampled gene expression time-series data , 2005 .

[6]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Zheng Li,et al.  Short time-series microarray analysis: Methods and challenges , 2008, BMC Systems Biology.

[8]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[9]  Eugene W. Myers,et al.  Clustering gene expression patterns of fly embryos , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[10]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[11]  Ho-Jin Lee,et al.  Clustering of time-course gene expression data using functional data analysis , 2007, Comput. Biol. Chem..

[12]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[13]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[14]  Ujjwal Maulik,et al.  Towards improving fuzzy clustering using support vector machine: Application to gene expression data , 2009, Pattern Recognit..

[15]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[16]  Frank Klawonn,et al.  Clustering of unevenly sampled gene expression time-series data , 2005, Fuzzy Sets Syst..

[17]  Hong Yan,et al.  Autoregressive-Model-Based Missing Value Estimation for DNA Microarray Time Series Data , 2009, IEEE Transactions on Information Technology in Biomedicine.

[18]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[19]  Abdel-Ouahab Boudraa,et al.  An Energy-Based Similarity Measure for Time Series , 2008, EURASIP J. Adv. Signal Process..