Generalization Bounds for Vicinal Risk Minimization Principle

The vicinal risk minimization (VRM) principle, first proposed by \citet{vapnik1999nature}, is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. Although there is strong numerical evidence showing that VRM outperforms ERM if appropriate vicinal functions are chosen, a comprehensive theoretical understanding of VRM is still lacking. In this paper, we study the generalization bounds for VRM. Our results support Vapnik's original arguments and additionally provide deeper insights into VRM. First, we prove that the complexity of function classes convolving with vicinal functions can be controlled by that of the original function classes under the assumption that the function class is composed of Lipschitz-continuous functions. Then, the resulting generalization bounds for VRM suggest that the generalization performance of VRM is also effected by the choice of vicinity function and the quality of function classes. These findings can be used to examine whether the choice of vicinal function is appropriate for the VRM-based learning setting. Finally, we provide a theoretical explanation for existing VRM models, e.g., uniform distribution-based models, Gaussian distribution-based models, and mixup models.

[1]  Peter I. Rockett,et al.  The use of vicinal-risk minimization for training decision trees , 2015, Appl. Soft Comput..

[2]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[3]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[4]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[5]  Yuan Yan Tang,et al.  The Generalization Performance of Regularized Regression Algorithms Based on Markov Sampling , 2014, IEEE Transactions on Cybernetics.