Energy Supervised Relevance Neural Gas for Feature Ranking

In pattern classification, input pattern features usually contribute differently, in accordance to their relevances for a specific classification task. In a previous paper, we have introduced the Energy Supervised Relevance Neural Gas classifier, a kernel method which uses the maximization of Onicescu’s informational energy for computing the relevances of input features. Relevances were used to improve classification accuracy. In our present work, we focus on the feature ranking capability of this approach. We compare our algorithm to standard feature ranking methods.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Angel Cataron,et al.  Informational Energy Kernel for LVQ , 2005, ICANN.

[3]  Lipo Wang,et al.  Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Derek Partridge,et al.  Feature ranking and best feature subset using mutual information , 2004, Neural Computing & Applications.

[5]  T. Wieczorek,et al.  Comparison of feature ranking methods based on information entropy , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[6]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[7]  Michel Verleysen,et al.  Information-theoretic feature selection for functional data classification , 2009, Neurocomputing.

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[10]  Michael Biehl,et al.  Relevance matrices in LVQ , 2007, ESANN.

[11]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[12]  Michel Verleysen,et al.  K-nearest neighbours based on mutual information for incomplete data classification , 2008, ESANN.

[13]  Barbara Hammer,et al.  Relevance determination in Learning Vector Quantization , 2001, ESANN.

[14]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[15]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[16]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[17]  Kari Torkkola,et al.  On feature extraction by mutual information maximization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Cesare Furlanello,et al.  An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[20]  G. Battail Théorie de l'information , 1982, Électronique.

[21]  Angel Cataron,et al.  An informational energy LVQ approach for feature ranking , 2004, ESANN.

[22]  Michel Verleysen,et al.  Advances in Feature Selection with Mutual Information , 2009, Similarity-Based Clustering.

[23]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[24]  R. Andonie,et al.  Feature ranking using supervised neural gas and informational energy , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[25]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[26]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[27]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[28]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[29]  Dick E. Boekee,et al.  A generalized class of certainty and information measures , 1982, Inf. Sci..

[30]  Thomas Villmann,et al.  Supervised Neural Gas with General Similarity Measure , 2005, Neural Processing Letters.

[31]  Silviu Guiaşu,et al.  Information theory with applications , 1977 .

[32]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[33]  M. Tesmer,et al.  AMIFS: adaptive feature selection by using mutual information , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[34]  Deniz Erdogmus,et al.  Vector quantization using information theoretic concepts , 2005, Natural Computing.

[35]  R. Andonie,et al.  Energy generalized LVQ with relevance factors , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[36]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[37]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[38]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..