Randomized learning: Generalization performance of old and new theoretically grounded algorithms

Abstract In the context of assessing the generalization abilities of a randomized model or learning algorithm, PAC-Bayes and Differential Privacy (DP) theories are the state-of-the-art tools. For this reason, in this paper, we will develop tight DP-based generalization bounds, which improve over the current state-of-the-art ones both in terms of constants and rate of convergence. Moreover, we will also prove that some old and new randomized algorithm, show better generalization performances with respect to their non private counterpart, if the DP is exploited for assessing their generalization ability. Results on a series of algorithms and real world problems show the practical validity of the achieved theoretical results.

[1]  Davide Anguita,et al.  Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates , 2015, Neural Processing Letters.

[2]  John Langford,et al.  Computable Shell Decomposition Bounds , 2000, J. Mach. Learn. Res..

[3]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[4]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[5]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[6]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[7]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[8]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[9]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Jeffrey S. Simonoff,et al.  A Casebook for a First Course in Statistics and Data Analysis. , 1995 .

[13]  Davide Anguita,et al.  PAC-bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis , 2016, Pattern Recognit. Lett..

[14]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[15]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[16]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[17]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[18]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[19]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[20]  Jacques Wainer,et al.  Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters , 2017, Pattern Recognit. Lett..

[21]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[22]  Davide Anguita,et al.  Differential privacy and generalization: Sharper bounds with applications , 2017, Pattern Recognit. Lett..

[23]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[24]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[25]  Daniel Hernández-Lobato,et al.  How large should ensembles of classifiers be? , 2013, Pattern Recognit..

[26]  Prateek Jain,et al.  Differentially Private Learning with Kernels , 2013, ICML.

[27]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[28]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[29]  Svetha Venkatesh,et al.  Differentially Private Random Forest with High Utility , 2015, 2015 IEEE International Conference on Data Mining.

[30]  David A. McAllester Some PAC-Bayesian theorems , 1998, COLT' 98.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  Piotr Fryzlewicz,et al.  Random Rotation Ensembles , 2016, J. Mach. Learn. Res..

[33]  Davide Anguita,et al.  A local Vapnik-Chervonenkis complexity , 2016, Neural Networks.

[34]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Stephen E. Fienberg,et al.  Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle , 2015, J. Mach. Learn. Res..

[36]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[37]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[38]  Davide Anguita,et al.  Local Rademacher Complexity: Sharper risk bounds with and without unlabeled samples , 2015, Neural Networks.

[39]  Arjun K. Gupta,et al.  Handbook of beta distribution and its applications , 2004 .

[40]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[41]  Brendan J. Frey,et al.  Are Random Forests Truly the Best Classifiers? , 2016, J. Mach. Learn. Res..

[42]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[43]  Davide Anguita,et al.  Fully Empirical and Data-Dependent Stability-Based Bounds , 2015, IEEE Transactions on Cybernetics.

[44]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[45]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[46]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[47]  Basit Shafiq,et al.  Differentially Private Naive Bayes Classification , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[48]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[49]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.