Sample Weighting: An Inherent Approach for Outlier Suppressing Discriminant Analysis

As the data acquirement technologies develop rapidly, both the amount and types of data become larger and larger. However, noise and outliers usually attach to the data and then affect the real performance of leaning algorithms in data mining and pattern analysis. To address this problem, the importance of the sample itself in building the optimal subspace is explored, and then an importance-sampling-inspired method is proposed for outlier suppressing feature extraction. First, we assign each sample a weight, which is estimated by graph Laplacian, and then calculate the approximated mean for each subject. By highlighting the most subject-oriented samples, the weighted average and the scatter metrics can be measured with maximum margins and superior classification performance. The supervised information integrates local data structure with respective contributions to building the optimal subspace. The linear criterion can be extended to a nonlinear case by the kernel trick. A regularization framework is proposed to deal with the rank-deficient problem, which is usually induced by the small sample size of training set. Competitive performance of our algorithm has been validated by extensive experiments performed on the synthetic and benchmark data, including facial images and gene micro-array data.

[1]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Stefanos Zafeiriou,et al.  Regularized Kernel Discriminant Analysis With a Robust Kernel for Face Recognition and Verification , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Josef Kittler,et al.  Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Dao-Qing Dai,et al.  Band-Reweighed Gabor Kernel Embedding for Face Image Representation and Recognition , 2014, IEEE Transactions on Image Processing.

[9]  Anastasios Tefas,et al.  Weighted Piecewise LDA for Solving the Small Sample Size Problem in Face Verification , 2007, IEEE Transactions on Neural Networks.

[10]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[11]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[12]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[13]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[14]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[15]  Jieping Ye,et al.  Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Mikhail Belkin,et al.  Inverse Density as an Inverse Problem: the Fredholm Equation Approach , 2013, NIPS.

[17]  Pong C. Yuen,et al.  Face Recognition by Regularized Discriminant Analysis , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[19]  Hong Yan,et al.  Robust classification using ℓ2, 1-norm based regression model , 2012, Pattern Recognit..

[20]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  A. Martínez,et al.  The AR face databasae , 1998 .

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Daoqiang Zhang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[25]  David Zhang,et al.  Local Linear Discriminant Analysis Framework Using Sample Neighbors , 2011, IEEE Transactions on Neural Networks.

[26]  YanShuicheng,et al.  Learning with l1-graph for image analysis , 2010 .

[27]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[29]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[30]  Tingting Mu,et al.  Adaptive Data Embedding Framework for Multiclass Classification , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Masashi Sugiyama,et al.  Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.

[32]  Nojun Kwak,et al.  Generalization of linear discriminant analysis using Lp-norm , 2013, Pattern Recognit. Lett..

[33]  Iickho Song,et al.  Complexity-Reduced Scheme for Feature Extraction With Linear Discriminant Analysis , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[34]  SchölkopfBernhard,et al.  Constructing Descriptive and Discriminative Nonlinear Features , 2003 .

[35]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[36]  Hiroshi Mamitsuka,et al.  Discriminative Graph Embedding for Label Propagation , 2011, IEEE Transactions on Neural Networks.

[37]  Tao Jiang,et al.  Robust and accurate cancer classification with gene expression profiling , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[38]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[39]  Xudong Jiang,et al.  Eigenfeature Regularization and Extraction in Face Recognition , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Zhongfei Zhang,et al.  Linear discriminant analysis using rotational invariant L1 norm , 2010, Neurocomputing.

[41]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[42]  Hong Yan,et al.  Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[43]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[44]  Jianmin Wang,et al.  Transfer Learning with Graph Co-Regularization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[45]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Feiping Nie,et al.  Extracting the optimal dimensionality for local tensor discriminant analysis , 2009, Pattern Recognit..

[47]  Huan Liu,et al.  An Unsupervised Feature Selection Framework for Social Media Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[48]  Jieping Ye,et al.  Regularized discriminant analysis for high dimensional, low sample size data , 2006, KDD '06.

[49]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.