Reinforced SVM method and memorization mechanisms

Abstract The paper is devoted to two problems: (1) reinforcement of SVM algorithms, and (2) justification of memorization mechanisms for generalization. (1) Current SVM algorithm was designed for the case when the risk for the set of nonnegative slack variables is defined by l 1 norm. In this paper, along with that classical l 1 norm, we consider risks defined by l 2 norm and l ∞ norm. Using these norms, we formulate several modifications of the existing SVM algorithm and show that the resulting modified SVM algorithms can improve (sometimes significantly) the classification performance. (2) Generalization ability of existing learning algorithms is usually explained by arguments involving uniform convergence of empirical losses to the corresponding expected losses over a given set of functions. However, along with bounds for uniform convergence of empirical losses to the expected losses, the VC theory also provides bounds for relative uniform convergence. These bounds lead to a more accurate estimate of the expected loss. Advanced methods of estimating of expected risk of error have to leverage these bounds, which also support mechanisms of training data memorization, which, as the paper demonstrates, can improve classification performance.

[1]  Rauf Izmailov,et al.  Knowledge transfer in SVM and neural networks , 2017, Annals of Mathematics and Artificial Intelligence.

[2]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[3]  Lorenzo Rosasco,et al.  Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.

[4]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[5]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[6]  Rauf Izmailov,et al.  Synergy of Monotonic Rules , 2016, J. Mach. Learn. Res..

[7]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[11]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  I. Gohberg,et al.  Basic Operator Theory , 1981 .

[16]  Mikhail Belkin,et al.  Does data interpolation contradict statistical optimality? , 2018, AISTATS.

[17]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[18]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[19]  Rauf Izmailov,et al.  V-matrix method of solving statistical inference problems , 2015, J. Mach. Learn. Res..

[20]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[21]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.