Feature Selection Using Autoencoders

Feature selection plays a vital role in improving the generalization accuracy in many classification tasks where datasets are high-dimensional. In feature selection, a minimal subset of relevant as well as non-redundant features is selected. Autoencoders are used to represent the datasets from original feature space to a reduced and more informative feature space. In this paper, we propose a novel approach for feature selection by traversing back the autoencoders through more probable links. Experiments on five publicly available large datasets show that our approach gives significant gains in accuracy over most of the state-of-the-art feature selection methods.

[1]  Yamuna Prasad,et al.  Max-Margin feature selection , 2016, Pattern Recognit. Lett..

[2]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[3]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[4]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Chandrika Kamath,et al.  Feature selection in scientific applications , 2004, KDD.

[6]  Ivor W. Tsang,et al.  Discovering Support and Affiliated Features from Very High Dimensions , 2012, ICML.

[7]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[8]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[11]  Hao Dong,et al.  An improved particle swarm optimization for feature selection , 2011 .

[12]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[13]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Mengjie Zhang,et al.  Multi-objective particle swarm optimisation (PSO) for feature selection , 2012, GECCO '12.

[16]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17]  Prabhas Chongstitvatana,et al.  A GA-Based Classifier for Microarray Data Classification , 2010, 2010 International Conference on Intelligent Computing and Cognitive Informatics.

[18]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[19]  Bruce A. Draper,et al.  Feature selection from huge feature sets , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Yamuna Prasad,et al.  SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design , 2010, ICSI.

[21]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.