Supervised feature selection through Deep Neural Networks with pairwise connected structure

Abstract Feature selection is an important data preprocessing strategy, has been proven empirically that it contributes to reducing the dimensionality of feature and enhancing the performance of learning algorithms in practice. Typical sparse learning-based models select the features by removing ones that the feature scores are zero. However, linear models puzzle to build the non-linear relations between features and responses. The Deep Neural Network (DNN) has a strong capability to mode the non-linear relations and has been employed to select features. In this paper, we introduce a novel deep Neural network-based Feature Selection (NeuralFS) method to identify features. The new model comprises of a fully-connection network, a decision network, and connect them through a pairwise connected structure. In NeuralFS, the fully-connected network is the crucial structure in NeuralFS that transforms the features into their corresponding scores, and the decision network is the final structure that performs classification or regression. The pairwise connected can be regarded as a “bridge” to connect the two networks, and its weights are fixed as the normalized input as well as it is un-trainable during model training. After optimizing, the feature scores can be obtained by calculating the output of the fully-connected network. NeuralFS takes advantage of the deep network to model the non-linearity, and also make features scores sparse without the sparse regularization technology. We apply the proposed method to both synthetic datasets and benchmark datasets to prove its effectiveness.

[1]  Ofir Lindenbaum,et al.  Deep supervised feature selection using Stochastic Gates , 2018, ICML.

[2]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[3]  Jun Wang,et al.  Feature Selection by Maximizing Independent Classification Information , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[5]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  William Stafford Noble,et al.  DeepPINK: reproducible feature selection in deep neural networks , 2018, NeurIPS.

[8]  Zhengya Sun,et al.  L0-norm Based Structural Sparse Least Square Regression for Feature Selection , 2015, Pattern Recognit..

[9]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[10]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[11]  Daoliang Li,et al.  Feature selection based on improved ant colony optimization for online detection of foreign fiber in cotton , 2014, Appl. Soft Comput..

[12]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[13]  Jason H. Moore,et al.  STatistical Inference Relief (STIR) feature selection , 2018, bioRxiv.

[14]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[15]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[16]  Le Thi Hoai An,et al.  Efficient approaches for ℓ2-ℓ0 regularization and applications to feature selection in SVM , 2016, Applied Intelligence.

[17]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[18]  Tong Zhang,et al.  Deep Learning Based Feature Selection for Remote Sensing Scene Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[19]  Laith Abualigah,et al.  Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications , 2020, Neural Computing and Applications.

[20]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[21]  Jing Liu,et al.  Feature selection based on FDA and F-score for multi-class classification , 2017, Expert Syst. Appl..

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Feiping Nie,et al.  Feature Selection via Global Redundancy Minimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Xiaoqin Zhang,et al.  An enhanced Bacterial Foraging Optimization and its application for training kernel extreme learning machine , 2020, Appl. Soft Comput..

[27]  Qinghua Hu,et al.  Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO , 2015, IEEE Transactions on Multimedia.

[28]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[29]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[30]  Isabelle Guyon,et al.  Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark , 2007, Pattern Recognit. Lett..

[31]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[32]  Laith Mohammad Abualigah,et al.  Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering , 2017, The Journal of Supercomputing.

[33]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[34]  Hossein Nezamabadi-pour,et al.  An advanced ACO algorithm for feature subset selection , 2015, Neurocomputing.

[35]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[36]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[37]  Debaditya Roy,et al.  Feature selection using Deep Neural Networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[38]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  Roland Memisevic,et al.  Zero-bias autoencoders and the benefits of co-adapting features , 2014, ICLR.

[41]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.