IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.

[1]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[2]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[3]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[4]  Xiaofeng Zhu,et al.  Adaptive Hypergraph Learning for Unsupervised Feature Selection , 2017, IJCAI.

[5]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Henry Adams,et al.  Persistence Images: A Stable Vector Representation of Persistent Homology , 2015, J. Mach. Learn. Res..

[7]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[8]  Liang Du,et al.  Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Ping Li,et al.  Theory of the GMM Kernel , 2016, WWW.

[11]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[12]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[13]  Andreas Uhl,et al.  Deep Learning with Topological Signatures , 2017, NIPS.

[14]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[15]  Donald A. Adjeroh,et al.  Random KNN feature selection - a fast and stable alternative to Random Forests , 2011, BMC Bioinformatics.

[16]  Huan Liu,et al.  Unsupervised Personalized Feature Selection , 2018, AAAI.

[17]  Steve Oudot,et al.  Persistence stability for geometric complexes , 2012, ArXiv.

[18]  Ping Li Linearized GMM Kernels and Normalized Random Fourier Features , 2017, KDD.

[19]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[20]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[21]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Xuelong Li,et al.  Unsupervised Feature Selection with Structured Graph Optimization , 2016, AAAI.

[23]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[24]  Ulrike von Luxburg,et al.  When do random forests fail? , 2018, NeurIPS.

[25]  Okko Johannes Räsänen,et al.  Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech , 2013, INTERSPEECH.