Let the Data Choose its Features: Differentiable Unsupervised Feature Selection

Scientific observations often consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, tailored for the task of clustering. We propose a differentiable loss function which combines the graph Laplacian with a gating mechanism based on continuous approximation of Bernoulli random variables. The Laplacian is used to define a scoring term that favors low-frequency features, while the parameters of the Bernoulli variables are trained to enable selection of the most informative features. We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set. Experimental demonstration of the efficacy of the proposed approach and its advantage over current baselines is provided using several real-world examples.

[1]  K. Thangavel,et al.  Unsupervised adaptive floating search feature selection based on Contribution Entropy , 2010, 2010 International Conference on Communication and Computational Intelligence (INCOCCI).

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  V. N. Sastry,et al.  Unsupervised feature ranking based on representation entropy , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).

[4]  Kien A. Hua,et al.  Decision tree classifier for network intrusion detection with GA-based feature selection , 2005, ACM Southeast Regional Conference.

[5]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[6]  Martin J. Wainwright,et al.  Kernel Feature Selection via Conditional Covariance Minimization , 2017, NIPS.

[7]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[8]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[9]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[10]  James Zou,et al.  Concrete Autoencoders for Differentiable Feature Selection and Reconstruction , 2019, ArXiv.

[11]  N. Simon,et al.  Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification , 2017, 1711.07592.

[12]  Ronald R. Coifman,et al.  Audio-Visual Group Recognition Using Diffusion Maps , 2010, IEEE Transactions on Signal Processing.

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[15]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[16]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[17]  Michal Linial,et al.  Novel Unsupervised Feature Filtering of Biological Data , 2006, ISMB.

[18]  Huan Liu,et al.  Embedded Unsupervised Feature Selection , 2015, AAAI.

[19]  Ofir Lindenbaum,et al.  Deep supervised feature selection using Stochastic Gates , 2018, ICML.

[20]  Genevera I. Allen Automatic Feature Selection via Weighted Kernels and Regularization , 2013 .

[21]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[22]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[23]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[24]  Chao Xu,et al.  Autoencoder Inspired Unsupervised Feature Selection , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[26]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[29]  Amit Singer,et al.  Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps , 2009, Proceedings of the National Academy of Sciences.

[30]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[31]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[32]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[33]  Yun Fu,et al.  Feature Selection Guided Auto-Encoder , 2017, AAAI.

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[36]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[37]  Arie Yeredor,et al.  Kernel Scaling for Manifold Learning and Classification , 2017, ArXiv.

[38]  Gang Pan,et al.  Activity-dependent neuron model for noise resistance , 2019, Neurocomputing.

[39]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[40]  C. Lursinsap,et al.  Univariate Filter Technique for Unsupervised Feature Selection Using a New Laplacian Score Based Local Nearest Neighbors , 2009, 2009 Asia-Pacific Conference on Information Processing.

[41]  Alexander D'Amour,et al.  Reducing Reparameterization Gradient Variance , 2017, NIPS.

[42]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[43]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[44]  Huan Liu,et al.  Spectral Feature Selection for Data Mining , 2011 .

[45]  B. Nadler,et al.  Diffusion Maps - a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms , 2008 .

[46]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Huan Liu,et al.  Unsupervised Personalized Feature Selection , 2018, AAAI.

[48]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[49]  Nikhil R. Pal,et al.  Feature selection with SVD entropy: Some modification and extension , 2014, Inf. Sci..

[50]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.