Support Matrix Machines

In many classification problems such as electroencephalogram (EEG) classification and image classification, the input features are naturally represented as matrices rather than vectors or scalars. In general, the structure information of the original feature matrix is useful and informative for data analysis tasks such as classification. One typical structure information is the correlation between columns or rows in the feature matrix. To leverage this kind of structure information, we propose a new classification method that we call support matrix machine (SMM). Specifically, SMM is defined as a hinge loss plus a so-called spectral elastic net penalty which is a spectral extension of the conventional elastic net over a matrix. The spectral elastic net enjoys a property of grouping effect, i.e., strongly correlated columns or rows tend to be selected altogether or not. Since the optimization problem for SMM is convex, this encourages us to devise an alternating direction method of multipliers (ADMM) algorithm for solving the problem. Experimental results on EEG and image classification data show that our model is more robust and efficient than the state-of-the-art methods.

[1]  Wim Van Paesschen,et al.  Incorporating structural information from the multichannel EEG improves patient-specific seizure detection , 2012, Clinical Neurophysiology.

[2]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ruslan Salakhutdinov,et al.  Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[6]  Lexin Li,et al.  Regularized matrix regression , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[7]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[8]  Yong Peng,et al.  EEG-based emotion recognition using discriminative graph regularized extreme learning machine , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[9]  Richard G. Baraniuk,et al.  Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..

[10]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[11]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[12]  Adrian Lewis,et al.  The mathematics of eigenvalue optimization , 2003, Math. Program..

[13]  Wei-Ying Ma,et al.  Support Tensor Machines for Text Categorization ∗ , 2006 .

[14]  Feiping Nie,et al.  Robust Discrete Matrix Completion , 2013, AAAI.

[15]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[17]  Bingsheng He,et al.  On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers , 2014, Numerische Mathematik.

[18]  Lior Wolf,et al.  Modeling Appearances with Low-Rank SVM , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Johan A. K. Suykens,et al.  Learning with tensors: a framework based on convex optimization and spectral regularization , 2014, Machine Learning.

[20]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[21]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[22]  Charless C. Fowlkes,et al.  Bilinear classifiers for visual recognition , 2009, NIPS.

[23]  Yong Peng,et al.  EEG-based emotion classification using deep belief networks , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[24]  M. Arfan Jaffar,et al.  Feature selection for efficient gender classification , 2010 .

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[27]  Michael A. Shepherd,et al.  Support vector machines for text categorization , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[28]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[29]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[30]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[31]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .