Multi-distance Support Matrix Machines

Real-world data such as digital images, MRI scans and electroencephalography signals are naturally represented as matrices with structural information. Most existing classifiers aim to capture these structures by regularizing the regression matrix to be low-rank or sparse. Some other methodologies introduce factorization technique to explore nonlinear relationships of matrix data in kernel space. In this paper, we propose a multi-distance support matrix machine (MDSMM), which provides a principled way of solving matrix classification problems. The multi-distance is introduced to capture the correlation within matrix data, by means of intrinsic information in rows and columns of input data. A complex hyperplane is established upon these values to separate distinct classes. We further study the generalization bounds for i.i.d. processes and non i.i.d. process based on both SVM and SMM classifiers. For typical hypothesis classes where matrix norms are constrained, MDSMM achieves a faster learning rate than traditional classifiers. We also provide a more general approach for samples without prior knowledge. We demonstrate the merits of the proposed method by conducting exhaustive experiments on both simulation study and a number of real-word datasets.

[1]  Pierre Alquier,et al.  Model selection for weakly dependent time series forecasting , 2009, 0902.2924.

[2]  Pheng-Ann Heng,et al.  Robust Support Matrix Machine for Single Trial EEG Classification , 2018, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[3]  Lexin Li,et al.  Regularized matrix regression , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[4]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[5]  Yong Liu,et al.  Infinite Kernel Learning: Generalization Bounds and Algorithms , 2017, AAAI.

[6]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Lucas C. Parra,et al.  Bilinear Discriminant Component Analysis , 2007, J. Mach. Learn. Res..

[9]  Haitao Xu,et al.  Multiple rank multi-linear kernel support vector machine for matrix data classification , 2018, Int. J. Mach. Learn. Cybern..

[10]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[11]  R. Hable,et al.  Qualitative robustness of estimators on stochastic processes , 2016 .

[12]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[13]  Jochen Triesch,et al.  Robust classification of hand postures against complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  Jie Xu,et al.  The Generalization Ability of SVM Classification Based on Markov Sampling , 2015, IEEE Transactions on Cybernetics.

[15]  Michael J. Lyons,et al.  Coding facial expressions with Gabor wavelets , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[16]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.

[17]  Philip S. Yu,et al.  Spatio-Temporal Tensor Analysis for Whole-Brain fMRI Classification , 2016, SDM.

[18]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[19]  Xiaowei Yang,et al.  A Linear Support Higher-Order Tensor Machine for Classification , 2013, IEEE Transactions on Image Processing.

[20]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[21]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[22]  Philip S. Yu,et al.  Kernelized Support Tensor Machines , 2017, ICML.

[23]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[24]  Aryeh Kontorovich,et al.  Predictive PAC Learning and Process Decompositions , 2013, NIPS.

[25]  Pheng-Ann Heng,et al.  Sparse Support Matrix Machine , 2018, Pattern Recognit..

[26]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[27]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[28]  Philip S. Yu,et al.  DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages , 2014, SDM.

[29]  Lior Wolf,et al.  Modeling Appearances with Low-Rank SVM , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Mehryar Mohri,et al.  Generalization bounds for non-stationary mixing processes , 2016, Machine Learning.

[31]  Zhihua Zhang,et al.  Support Matrix Machines , 2015, ICML.

[32]  Shahar Mendelson,et al.  Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[33]  Yiming Ying,et al.  Unregularized Online Learning Algorithms with General Loss Functions , 2015, ArXiv.

[34]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .