Improving classification of microarray data using prototype-based feature selection

This paper addresses the problem of improving accuracy in the machine-learning task of classification from microarray data. One of the known issues specifically related to microarray data is the large number of inputs (genes) versus the small number of available samples (conditions). A promising direction of research to decrease the generalization error of classification algorithms is to perform gene selection so as to identify those genes which are potentially most relevant for the classification. Classical feature selection methods are based on direct statistical methods. We present a reduction algorithm based on the notion of prototypegene. Each prototype represents a set of similar gene according to a given clustering method. We present experimental evidence of the usefulness of combining prototype-based feature selection with statistical gene selection methods for the task of classifying adenocarcinoma from gene expressions.

[1]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Chambers,et al.  The New S Language , 1989 .

[3]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[4]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[7]  Iñaki Inza,et al.  Gene selection by sequential search wrapper approaches in microarray cancer class prediction , 2002, J. Intell. Fuzzy Syst..

[8]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[9]  Gaolin Zheng,et al.  Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung Adenocarcinoma , 2003 .

[10]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  J. Tou FEATURE SELECTION FOR PATTERN RECOGNITION SYSTEMS , 1969 .

[13]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[14]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[15]  K Clément [Monogenic forms of obesity: from mice to human]. , 2000, Annales d'endocrinologie.

[16]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[17]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.