Analysis of SAGE Results with Combined Learning Techniques

Serial analysis of gene expression (SAGE) experiments could provide us the expression level of thousands of genes in a biological sample. However, the number of available samples is relatively small. Such undersampled problem needs to be carefully analyzed from a ma- chine learning perspective. In this paper, we combine several state-of-the- art techniques for classication, feature selection, and error estimation, and evaluate the performance of the combined techniques on the SAGE dataset. Our results show that a novel algorithm, support vector ma- chine with the stump kernel, performs well on the SAGE dataset both for building an accurate classier, and for selecting relevant features.

[1]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[2]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[3]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[4]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[6]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[7]  Ling Li,et al.  Infinite Ensemble Learning with Support Vector Machines , 2005, ECML.

[8]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Yaser S. Abu-Mostafa,et al.  Learning from Hints , 1994, J. Complex..

[11]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[12]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[13]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[14]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.