Multiclass Classification and Feature Selection Based on Least Squares Regression with Large Margin

Least squares regression (LSR) is a fundamental statistical analysis technique that has been widely applied to feature learning. However, limited by its simplicity, the local structure of data is easy to neglect, and many methods have considered using orthogonal constraint for preserving more local information. Another major drawback of LSR is that the loss function between soft regression results and hard target values cannot precisely reflect the classification ability; thus, the idea of the large margin constraint is put forward. As a consequence, we pay attention to the concepts of large margin and orthogonal constraint to propose a novel algorithm, orthogonal least squares regression with large margin (OLSLM), for multiclass classification in this letter. The core task of this algorithm is to learn regression targets from data and an orthogonal transformation matrix simultaneously such that the proposed model not only ensures every data point can be correctly classified with a large margin than conventional least squares regression, but also can preserve more local data structure information in the subspace. Our efficient optimization method for solving the large margin constraint and orthogonal constraint iteratively proved to be convergent in both theory and practice. We also apply the large margin constraint in the process of generating a sparse learning model for feature selection via joint ℓ2,1-norm minimization on both loss function and regularization terms. Experimental results validate that our method performs better than state-of-the-art methods on various real-world data sets.

[1]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Chenping Hou,et al.  Robust feature selection via simultaneous capped ℓ2-norm and ℓ2,1-norm minimization , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[4]  Cho-Jui Hsieh,et al.  Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .

[5]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[6]  Xuelong Li,et al.  A generalized power iteration method for solving quadratic problem on the Stiefel manifold , 2017, Science China Information Sciences.

[7]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yousef Saad,et al.  Orthogonal Neighborhood Preserving Projections: A Projection-Based Dimensionality Reduction Technique , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Feiping Nie,et al.  Orthogonal vs. uncorrelated least squares discriminant analysis for feature extraction , 2012, Pattern Recognit. Lett..

[10]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[12]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  Shannon L. Risacher,et al.  Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance , 2011, 2011 International Conference on Computer Vision.

[15]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[16]  Jianbo Yu,et al.  Local and global principal component analysis for process monitoring , 2012 .

[17]  Feiping Nie,et al.  Multiple rank multi-linear SVM for matrix data classification , 2014, Pattern Recognit..

[18]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[19]  Feiping Nie,et al.  Orthogonal locality minimizing globality maximizing projections for feature extraction , 2009 .

[20]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[21]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[22]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[23]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[24]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[25]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[26]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[27]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yi Wu,et al.  Stable local dimensionality reduction approaches , 2009, Pattern Recognit..

[29]  Hong Man,et al.  Face recognition based on multi-class mapping of Fisher scores , 2005, Pattern Recognit..

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[32]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[33]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[34]  Feiping Nie,et al.  Unsupervised maximum margin feature selection via L2,1-norm minimization , 2012, Neural Computing and Applications.

[35]  Bo Jiang,et al.  Groupwise Registration of MR Brain Images Containing Tumors via Spatially Constrained Low-Rank Based Image Recovery , 2017, MICCAI.

[36]  Shannon L. Risacher,et al.  Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort , 2012, Bioinform..

[37]  Shiming Xiang,et al.  Retargeted Least Squares Regression Algorithm , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[39]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Exact Top-k Feature Selection via ℓ2,0-Norm Constraint , 2022 .

[41]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Feiping Nie,et al.  Orthogonal least squares regression for feature extraction , 2016, Neurocomputing.

[44]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[45]  Richard D. Braatz,et al.  Fisher Discriminant Analysis , 2000 .

[46]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[47]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[48]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.