A New Discriminant Principal Component Analysis Method with Partial Supervision

Principal component analysis (PCA) is one of the most widely used unsupervised dimensionality reduction methods in pattern recognition. It preserves the global covariance structure of data when labels of data are not available. However, in many practical applications, besides the large amount of unlabeled data, it is also possible to obtain partial supervision such as a few labeled data and pairwise constraints, which contain much more valuable information for discrimination than unlabeled data. Unfortunately, PCA cannot utilize that useful discriminant information effectively. On the other hand, traditional supervised dimensionality reduction methods such as linear discriminant analysis perform on only labeled data. When labeled data are insufficient, their performances will deteriorate. In this paper, we propose a novel discriminant PCA (DPCA) model to boost the discriminant power of PCA when both unlabeled and labeled data as well as pairwise constraints are available. The derived DPCA algorithm is efficient and has a closed form solution. Experimental results on several UCI and face data sets show that DPCA is superior to several established dimensionality reduction methods.

[1]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[2]  Qi Tian,et al.  Hybrid PCA and LDA Analysis of Microarray Gene Expression Data , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[3]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[4]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[5]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[7]  Raymond J. Mooney,et al.  Semi-supervised clustering: probabilistic models, algorithms and experiments , 2005 .

[8]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.

[9]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[10]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[11]  Daoqiang Zhang,et al.  Semi-Supervised Dimensionality Reduction ∗ , 2007 .

[12]  Dit-Yan Yeung,et al.  Semi-supervised discriminant analysis via CCCP , 2008 .

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.