Multi-label Lagrangian support vector machine with random block coordinate descent method

Multi-label Lagrangian support vector machine is proposed.Our method runs averagely much faster than two existing multi-label SVM-type methods.This method produces fewer support vectors than such two SVM-type rivals.Our new algorithm is a competitive candidate for multi-label classification. When all training instances and labels are considered all together in a single optimization problem, multi-label support and core vector machines (i.e., Rank-SVM and Rank-CVM) are formulated as quadratic programming (QP) problems with equality and bounded constraints, whose training procedures have a sub-linear convergence rate. Therefore it is highly desirable to design and implement a novel efficient SVM-type multi-label algorithm. In this paper, through applying pairwise constraints between relevant and irrelevant labels, and defining an approximate ranking loss, we generalize binary Lagrangian support vector machine (LSVM) to construct its multi-label form (Rank-LSVM), resulting into a strictly convex QP problem with non-negative constraints only. Particularly, each training instance is associated with a block of variables and all variables are divided naturally into manageable blocks. Consequently we build an efficient training procedure for Rank-LSVM using random block coordinate descent method with a linear convergence rate. Moreover a heuristic strategy is applied to reduce the number of support vectors. Experimental results on twelve data sets demonstrate that our method works better according to five performance measures, and averagely runs 15 and 107 times faster and has 9 and 15% fewer support vectors, compared with Rank-CVM and Rank-SVM.

[1]  Albert Fornells,et al.  Multi-label classification based on analog reasoning , 2013, Expert Syst. Appl..

[2]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[3]  Xuelong Li,et al.  Supervised Tensor Learning , 2005, ICDM.

[4]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[7]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[8]  Haitao Xu,et al.  Multiple rank multi-linear kernel support vector machine for matrix data classification , 2018, Int. J. Mach. Learn. Cybern..

[9]  Jianhua Xu,et al.  A Random Block Coordinate Descent Method for Multi-label Support Vector Machine , 2013, ICONIP.

[10]  James E. Gentle,et al.  Matrix Algebra: Theory, Computations, and Applications in Statistics , 2007 .

[11]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[12]  Xinjun Peng,et al.  Building sparse twin support vector machine classifiers in primal space , 2011, Inf. Sci..

[13]  Ion Necoara,et al.  A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints , 2013, Comput. Optim. Appl..

[14]  Jianhua Xu,et al.  Fast multi-label core vector machine , 2013, Pattern Recognit..

[15]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[16]  Katya Scheinberg,et al.  Block Coordinate Descent Methods for Semidefinite Programming , 2012 .

[17]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[19]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[20]  Shie-Jue Lee,et al.  FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors , 2012, Expert Syst. Appl..

[21]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[22]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[23]  Clifford Hildreth,et al.  A quadratic programming procedure , 1957 .

[24]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[25]  Alex Alves Freitas,et al.  A Tutorial on Multi-label Classification Techniques , 2009, Foundations of Computational Intelligence.

[26]  Li Sun,et al.  A new privacy-preserving proximal support vector machine for classification of vertically partitioned data , 2014, International Journal of Machine Learning and Cybernetics.

[27]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[28]  Jacek M. Zurada,et al.  Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[29]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[30]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[31]  Yuan-Hai Shao,et al.  Laplacian smooth twin support vector machine for semi-supervised classification , 2013, International Journal of Machine Learning and Cybernetics.

[32]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[33]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[34]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[35]  Patrice Marcotte,et al.  Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[36]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[37]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  Wu Tie-jun Support vector machines for pattern recognition , 2003 .

[40]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[41]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[42]  Min Wu,et al.  Multi-label ensemble based on variable pairwise constraint projection , 2013, Inf. Sci..

[43]  Fabrice Heitz,et al.  Robust Pose Estimation and Recognition Using Non-Gaussian Modeling of Appearance Subspaces , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[45]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[46]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[47]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[48]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[49]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[50]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..