论文信息 - κ-NN for the classification of human cancer samples using the gene expression profiles.

κ-NN for the classification of human cancer samples using the gene expression profiles.

The [Formula: see text]-Nearest Neighbor (k-NN) classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, the performance of [Formula: see text]-NN depends strongly on the distance considered to evaluate the sample proximities. Besides, the choice of a good dissimilarity is a difficult task and depends on the problem at hand. In this chapter, we introduce a method to learn the metric from the data to improve the [Formula: see text]-NN classifier. To this aim, we consider a regularized version of the kernel alignment algorithm that incorporates a term that penalizes the complexity of the family of distances avoiding overfitting. The error function is optimized using a semidefinite programming approach (SDP). The method proposed has been applied to the challenging problem of cancer identification using the gene expression profiles. Kernel alignment [Formula: see text]-NN outperforms other metric learning strategies and improves the classical [Formula: see text]-NN algorithm.

Manuel Martín-Merino

[1] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[2] T. Golub,et al. The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma. , 2003, Blood.

[3] N. Cristianini,et al. Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[4] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[5] T. Poggio,et al. Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[6] R. Spang,et al. Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7] Michael I. Jordan,et al. Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[8] Rafael A. Irizarry,et al. Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[9] Alexander J. Smola,et al. Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[10] Robert P. W. Duin,et al. A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..