Optimal classification for time-course gene expression data using functional data analysis

Classification problems have received considerable attention in biological and medical applications. In particular, classification methods combining to microarray technology play an important role in diagnosing and predicting disease, such as cancer, in medical research. Primary objective in classification is to build an optimal classifier based on the training sample in order to predict unknown class in the test sample. In this paper, we propose a unified approach for optimal gene classification with conjunction with functional principal component analysis (FPCA) in functional data analysis (FNDA) framework to classify time-course gene expression profiles based on information from the patterns. To derive an optimal classifier in FNDA, we also propose to find optimal number of bases in the smoothing step and functional principal components in FPCA using a cross-validation technique, and compare the performance of some popular classification techniques in the proposed setting. We illustrate the propose method with a simulation study and a real world data analysis.

[1]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[2]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[3]  Ho-Jin Lee,et al.  Clustering of time-course gene expression data using functional data analysis , 2007, Comput. Biol. Chem..

[4]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[6]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[7]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[8]  Arpad Kelemen,et al.  Temporal gene expression classification with regularised neural network , 2005, Int. J. Bioinform. Res. Appl..

[9]  T Hwa,et al.  Expression patterns of cell-type-specific genes in Dictyostelium. , 2001, Molecular biology of the cell.

[10]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[11]  Martin Alexander Youngson,et al.  Linear Functional Analysis , 2000 .

[12]  Frédéric Ferraty,et al.  Curves discrimination: a nonparametric functional approach , 2003, Comput. Stat. Data Anal..

[13]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[14]  Jason Weston Leave-One-Out Support Vector Machines , 1999, IJCAI.

[15]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[18]  Hans-Georg Müller,et al.  Classification using functional data analysis for temporal gene expression data , 2006, Bioinform..

[19]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[20]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[22]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.