论文信息 - A Hyperplane-Based Algorithm for Semi-Supervised Dimension Reduction

A Hyperplane-Based Algorithm for Semi-Supervised Dimension Reduction

We consider the semi-supervised dimension reduction problem: given a high dimensional dataset with a small number of labeled data and huge number of unlabeled data, the goal is to find the low-dimensional embedding that yields good classification results. Most of the previous algorithms for this task are linkage-based algorithms. They try to enforce the must-link and cannot-link constraints in dimension reduction, leading to a nearest neighbor classifier in low dimensional space. In this paper, we propose a new hyperplane-based semi-supervised dimension reduction method—the main objective is to learn the low-dimensional features that can both approximate the original data and form a good separating hyperplane. We formulate this as a non-convex optimization problem and propose an efficient algorithm to solve it. The algorithm can scale to problems with millions of features and can easily incorporate non-negative constraints in order to learn interpretable non-negative features. Experiments on real world datasets demonstrate that our hyperplane-based dimension reduction method outperforms state-of-art linkage-based methods when very few labels are available.

[1] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2] Ruslan Salakhutdinov,et al. Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[3] Inderjit S. Dhillon,et al. Robust Principal Component Analysis with Side Information , 2016, ICML.

[4] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5] Seungjin Choi,et al. Semi-Supervised Nonnegative Matrix Factorization , 2010, IEEE Signal Processing Letters.

[6] Andrzej Cichocki,et al. Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[7] Peijun Du,et al. Semi-supervised dimensionality reduction for hyperspectral remote sensing image classification , 2012, 2012 4th Workshop on Hyperspectral Image and Signal Processing (WHISPERS).

[8] Ian Davidson,et al. Knowledge Driven Dimension Reduction for Clustering , 2009, IJCAI.

[9] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[10] Jing Wang,et al. Independent component analysis-based dimensionality reduction with applications in hyperspectral image analysis , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[11] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[12] I. Jolliffe. Principal Component Analysis , 2002 .

[13] Inderjit S. Dhillon,et al. Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[14] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15] Inderjit S. Dhillon,et al. Matrix Completion with Noisy Side Information , 2015, NIPS.

[16] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[17] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[18] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[19] Michael W. Berry,et al. Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[20] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[21] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[22] Jaegul Choo,et al. Weakly supervised nonnegative matrix factorization for user-driven clustering , 2014, Data Mining and Knowledge Discovery.

[23] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..