论文信息 - IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.

Ping Li | Chenxi Wu | Xiaoyun Li

[1] Deng Cai,et al. Unsupervised feature selection for multi-cluster data , 2010, KDD.

[2] Huan Liu,et al. Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[3] Peter Bubenik,et al. Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[4] Xiaofeng Zhu,et al. Adaptive Hypergraph Learning for Unsupervised Feature Selection , 2017, IJCAI.

[5] Lei Wang,et al. On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6] Henry Adams,et al. Persistence Images: A Stable Vector Representation of Persistent Homology , 2015, J. Mach. Learn. Res..

[7] Achim Zeileis,et al. BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[8] Liang Du,et al. Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[9] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10] Ping Li,et al. Theory of the GMM Kernel , 2016, WWW.

[11] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.