MULTICLASS SVM WITH HIERARCHICAL INTERACTION: APPLICATION TO FACE CLASSIFICATION

Standard classification models are usually additive models, which only consider the contributions from the main effects of features. When the features are highly correlated, the interactions between features provide us not only more additional features, but also the underlying graphs between features. In this paper, we integrate into multiclass SVM a strong hierarchy regularization in order to learn the main effects and the interactions. A primal-dual proximal algorithm with epigraphical projection is proposed to minimize the objective function. The proposed algorithm is applied to face classification task on the Extended YaleB database and the results validate its effectiveness.

[1]  J. A. Anderson,et al.  Quadratic logistic discrimination , 1975 .

[2]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[3]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[4]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[5]  A. Chambolle,et al.  On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..

[6]  Nelly Pustelnik,et al.  Multiclass SVM with graph path coding regularization for face classification , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  I. M. Otivation Playing with Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems , 2018 .

[8]  Jian Yang,et al.  Sparse discriminative feature selection , 2015, Pattern Recognit..

[9]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[10]  Y. She,et al.  Group Regularized Estimation Under Structural Hierarchy , 2014, 1411.4691.

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  Nelly Pustelnik,et al.  A Proximal Approach for Sparse Multiclass SVM , 2015, ArXiv.

[13]  Trevor Hastie,et al.  Learning interactions through hierarchical group-lasso regularization , 2013, 1308.2719.

[14]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[15]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[16]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[20]  Kazuhiro Seki,et al.  Block coordinate descent algorithms for large-scale sparse multiclass classification , 2013, Machine Learning.

[21]  Noah Simon,et al.  Convex Modeling of Interactions With Strong Heredity , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[22]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[23]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[24]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[25]  R. Randles,et al.  Generalized Linear and Quadratic Discriminant Functions Using Robust Estimates , 1978 .

[26]  R. Shafer,et al.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[27]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[28]  Bang Công Vu,et al.  A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[29]  Lei Wang,et al.  Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..