A novel semi-supervised approach for feature extraction

Feature extraction is an essential preprocessing step in machine learning and data mining. Generally, supervised feature extraction algorithms with prior knowledge outperform unsupervised ones without prior knowledge. In particular, nearly all existing supervised feature extraction algorithms employ class labels or pairwise constraints as supervised information. In this paper, we propose to employ another form of supervised information, i.e. Universum, which demonstrates a collection of “non-examples” that do not belong to either class/cluster of interest, but belong to the same domain as the problem at hand. Universum data samples can be obtained easily and are more practical and inexpensive than class labels. We address this topic in feature extraction research and propose a novel semi-supervised approach for feature extraction based on Universum. Experiments are carried out to compare the proposed algorithms with well-known unsupervised and supervised feature extraction algorithms on several UCI data sets. The results show that, with very few Universum data, the proposed algorithms are superior to unsupervised algorithms, and achieve similar or even higher performance than LDA with full class labels on the whole training data.

[1]  Wuyang Dai,et al.  Practical Conditions for Effectiveness of the Universum Learning , 2011, IEEE Transactions on Neural Networks.

[2]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[3]  Xiaoyang Tan,et al.  Pattern Recognition , 2016, Communications in Computer and Information Science.

[4]  Jason Weston,et al.  Inference with the Universum , 2006, ICML.

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  Changshui Zhang,et al.  Selecting Informative Universum Sample for Semi-Supervised Learning , 2009, IJCAI.

[7]  Yong Shi,et al.  A nonparallel support vector machine for a classification problem with universum learning , 2014, J. Comput. Appl. Math..

[8]  Fei Wang,et al.  Semi-Supervised Classification with Universum , 2008, SDM.

[9]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[10]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[11]  Dan Zhang,et al.  Document clustering with universum , 2011, SIGIR.

[12]  Wenwen Liu,et al.  Multi-view learning with Universum , 2014, Knowl. Based Syst..

[13]  Yong Shi,et al.  Twin support vector machine with Universum data , 2012, Neural Networks.

[14]  Hui Xue,et al.  Universum linear discriminant analysis , 2012 .

[15]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[16]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[18]  Fumin Shen,et al.  {\cal U}Boost: Boosting with the Universum , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Qiang Wu,et al.  Exploiting Universum data in AdaBoost using gradient descent , 2014, Image Vis. Comput..