论文信息 - Analysis of SAGE Results with Combined Learning Techniques

Analysis of SAGE Results with Combined Learning Techniques

Serial analysis of gene expression (SAGE) experiments could provide us the expression level of thousands of genes in a biological sample. However, the number of available samples is relatively small. Such undersampled problem needs to be carefully analyzed from a ma- chine learning perspective. In this paper, we combine several state-of-the- art techniques for classication, feature selection, and error estimation, and evaluate the performance of the combined techniques on the SAGE dataset. Our results show that a novel algorithm, support vector ma- chine with the stump kernel, performs well on the SAGE dataset both for building an accurate classier, and for selecting relevant features.

Hsuan-Tien Lin | Ling Li

[1] Chih-Jen Lin,et al. Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[2] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[3] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[4] D Haussler,et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[6] Aik Choon Tan,et al. Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[7] Ling Li,et al. Infinite Ensemble Learning with Support Vector Machines , 2005, ECML.

[8] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..

[9] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10] Yaser S. Abu-Mostafa,et al. Learning from Hints , 1994, J. Complex..

[11] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.

[12] Jason Weston,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[13] Nir Friedman,et al. Tissue classification with gene expression profiles. , 2000 .

[14] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[15] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[16] Chih-Jen Lin,et al. Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[17] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.