Gene Selection Using Random Voronoi Ensembles

In this paper we propose a flexible method for analyzing the relevance of input variables in high dimensional problems with respect to a given dichotomic classification problem. Both linear and non-linear cases are considered. In the linear case, the application of derivative-based saliency yields a commonly adopted ranking criterion. In the non-linear case, the method is extended by introducing a resampling technique and by clustering the obtained results for stability of the estimate. The method was preliminarly validated on the data published by T.R. Golub et al. on a study, at the molecular level, of two kinds of leukemia: Acute Myeloid Leukemia and Acute Lymphoblastic Leukemia (Science 5439-286, 531-537, 1999). Our technique indicates that, among the top 20 genes found by the final cluster analysis, 8 of the 50 genes listed in the original work feature a stronger discriminating power.

[1]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[2]  L. K. Buehler,et al.  Normalizing DNA microarray data. , 2002, Current issues in molecular biology.

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Frank Weller Stability of voronoi neighborship under perturbations of the sites , 1997, CCCG.

[5]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[6]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[7]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[8]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[9]  Marko Grobelnik,et al.  Feature Selection Using Linear Support Vector Machines , 2002 .

[10]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[15]  Vikas Sindhwani,et al.  Information Theoretic Feature Crediting in Multiclass Support Vector Machines , 2001, SDM.

[16]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[17]  Rodolfo Zunino,et al.  Automated diagnosis and disease characterization using neural network analysis , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.