A comparative study of neural-network feature weighting

AbstractMany feature weighting methods have been proposed to evaluate feature saliencies in recent years. Neural-network (NN) feature weighting, as a supervised method, is founded upon the mapping from input features to output decisions, and implemented by evaluating the sensitivity of network outputs to its inputs. Through training on sample data, NN implicitly embodies the saliencies of input features. The partial derivatives of the outputs with respect to the inputs in the trained NN are calculated to measure their sensitivities to input features, which means that implicit feature weighting of the NN is transformed into explicit feature weighting. The purpose of this paper is to further probe into the principle of NN feature weighting, and evaluate its performance through a comparative study between NN feature weighting method and state-of-art weighting methods in the same working conditions. The motivation of this study is inspired by the lack of direct and comprehensive comparison studies of NN feature weighting method. Experiments in UCI repository data sets, face data sets and self-built data sets show that NN feature weighting method achieves superior performance in different conditions and has promising prospects. Compared with the other existing methods, NN feature weighting method can be used in more complex conditions, provided that NN can work in those conditions. As decision data, output data can be labels, reals or integers. Especially, feature weights can be calculated without the discretization of outputs in the condition of continuous outputs.

[1]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[2]  Daoqiang Zhang,et al.  Iterative Laplacian Score for Feature Selection , 2012, CCPR.

[3]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[4]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[5]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[6]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[7]  Thomas Martinetz,et al.  Maximum distance minimization for feature weighting , 2015, Pattern Recognit. Lett..

[8]  Wang Yingming,et al.  Using the method of maximizing deviation to make decision for multiindices , 2012 .

[9]  Gui-Wu Wei,et al.  Grey relational analysis method for 2-tuple linguistic multiple attribute group decision making with incomplete weight information , 2011, Expert Syst. Appl..

[10]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[11]  Claudio A. Perez,et al.  Fusion of local normalization and Gabor entropy weighted features for face identification , 2014, Pattern Recognit..

[12]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[13]  G. Mavrotas,et al.  Determining objective weights in multiple criteria problems: The critic method , 1995, Comput. Oper. Res..

[14]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[15]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  E. Romero,et al.  Feature selection forcing overtraining may help to improve performance , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[18]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[19]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[20]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[21]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[22]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[25]  Mao Qiang Novel credit rating method under electronic commerce , 2009 .

[26]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[27]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[28]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[29]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[30]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Alexander J. Smola,et al.  Efficient mini-batch training for stochastic optimization , 2014, KDD.

[32]  Thomas Martinetz,et al.  Feature Weighting by Maximum Distance Minimization , 2013, ICANN.

[33]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[34]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[35]  Changchun Bao,et al.  Speech enhancement with weighted denoising auto-encoder , 2013, INTERSPEECH.

[36]  L. Delchambre Weighted principal component analysis: a weighted covariance eigendecomposition approach , 2014, 1412.4533.

[37]  Bo Yang,et al.  A fast feature weighting algorithm of data gravitation classification , 2017, Inf. Sci..

[38]  Jian Yang,et al.  Sparse discriminative feature weights learning , 2016, Neurocomputing.

[39]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[40]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[41]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.