论文信息 - Fully Empirical and Data-Dependent Stability-Based Bounds

Fully Empirical and Data-Dependent Stability-Based Bounds

The purpose of this paper is to obtain a fully empirical stability-based bound on the generalization ability of a learning procedure, thus, circumventing some limitations of the structural risk minimization framework. We show that assuming a desirable property of a learning algorithm is sufficient to make data-dependency explicit for stability, which, instead, is usually bounded only in an algorithmic-dependent way. In addition, we prove that a well-known and widespread classifier, like the support vector machine (SVM), satisfies this condition. The obtained bound is then exploited for model selection purposes in SVM classification and tested on a series of real-world benchmarking datasets demonstrating, in practice, the effectiveness of our approach.

[1] S. Sathiya Keerthi,et al. Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[2] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[3] O. Bousquet. A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[4] M. Opper,et al. On the ability of the optimal perceptron to generalise , 1990 .

[5] Deyu Meng,et al. Fast and Efficient Strategies for Model Selection of Gaussian Support Vector Machine , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6] T. Poggio,et al. STABILITY RESULTS IN LEARNING THEORY , 2005 .

[7] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[8] S. Kutin. Extensions to McDiarmid's inequality when dierences are bounded with high probability , 2002 .

[9] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[10] Li Li,et al. Support Vector Machines , 2015 .

[11] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[12] Isabelle Guyon,et al. Model Selection: Beyond the Bayesian/Frequentist Divide , 2010, J. Mach. Learn. Res..

[13] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[14] Ichiro Takeuchi,et al. Multiple Incremental Decremental Learning of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.

[15] Davide Anguita,et al. K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines , 2009, DMIN.

[16] M. Opper,et al. Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.

[17] Sayan Mukherjee,et al. Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[18] Don R. Hush,et al. Machine Learning with Data Dependent Hypothesis Classes , 2002, J. Mach. Learn. Res..

[19] Davide Anguita,et al. The 'K' in K-fold Cross Validation , 2012, ESANN.

[20] Przemyslaw Klesk,et al. Sets of approximating functions with finite Vapnik-Chervonenkis dimension for nearest-neighbors algorithms , 2011, Pattern Recognit. Lett..

[21] S. Boucheron,et al. A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[22] Jean-Yves Audibert. Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[23] Alexander J. Smola,et al. Support Vector Regression Machines , 1996, NIPS.

[24] Gert Cauwenberghs,et al. Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[25] Christian Igel,et al. Maximum Likelihood Model Selection for 1-Norm Soft Margin SVMs with Multiple Parameters , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Luc Devroye,et al. Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[27] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[28] Davide Anguita,et al. In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[29] B. Efron. The jackknife, the bootstrap, and other resampling plans , 1987 .

[30] Ichiro Takeuchi,et al. Nonlinear Regularization Path for Quadratic Loss Support Vector Machines , 2011, IEEE Transactions on Neural Networks.

[31] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32] W. Rogers,et al. A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[33] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[34] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[35] David A. McAllester. PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[36] Shivani Agarwal,et al. Stability and Generalization of Bipartite Ranking Algorithms , 2005, COLT.

[37] H. Akaike. A new look at the statistical model identification , 1974 .

[38] José Antonio Lozano,et al. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[40] Chih-Jen Lin,et al. A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[41] Keinosuke Fukunaga,et al. Leave-One-Out Procedures for Nonparametric Error Estimates , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[42] Massimiliano Pontil,et al. Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..

[43] S. Sathiya Keerthi,et al. An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels , 2004, IEEE Transactions on Neural Networks.

[44] Shiliang Sun,et al. A review of optimization methodologies in support vector machines , 2011, Neurocomputing.

[45] D. Anguita,et al. K-fold generalization capability assessment for support vector classifiers , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[46] Shiliang Sun,et al. PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[47] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[48] Dariu Gavrila,et al. An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Thore Graepel,et al. A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[50] Edward R. Dougherty,et al. Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[51] Ivor W. Tsang,et al. Very Large SVM Training using Core Vector Machines , 2005, AISTATS.

[52] Ding-Xuan Zhou,et al. Learning with sample dependent hypothesis spaces , 2008, Comput. Math. Appl..

[53] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[54] Dana Ron,et al. Algorithmic Stability and Sanity-Check Bounds for Leave-one-Out Cross-Validation , 1997, COLT.

[55] Denis J. Dean,et al. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[56] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .

[57] M. Talagrand. A new look at independence , 1996 .

[58] Zongben Xu,et al. Dynamic Extreme Learning Machine and Its Approximation Capability , 2013, IEEE Transactions on Cybernetics.

[59] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[60] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2004 .

[61] Samy Bengio,et al. A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[62] Yen-Liang Chen,et al. A Novel Decision-Tree Method for Structured Continuous-Label Classification , 2013, IEEE Transactions on Cybernetics.

[63] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[64] Francisco Herrera,et al. Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[65] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .

[66] Davide Anguita,et al. In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[67] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[68] Peter L. Bartlett,et al. Localized Rademacher Complexities , 2002, COLT.

[69] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[70] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[71] Bernhard Schölkopf,et al. The Kernel Trick for Distances , 2000, NIPS.

[72] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[73] M. Opper. Statistical Mechanics of Learning : Generalization , 2002 .

[74] Johan A. K. Suykens,et al. Approximate Confidence and Prediction Intervals for Least Squares Support Vector Regression , 2011, IEEE Transactions on Neural Networks.

[75] Ingo Steinwart,et al. Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[76] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.

[77] Marcos M. Campos,et al. SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[78] Chih-Jen Lin,et al. Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[79] Chih-Jen Lin,et al. Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[80] S. Sathiya Keerthi,et al. Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[81] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.

[82] Journal Url,et al. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2008 .

[83] Michaël Aupetit. Nearly homogeneous multi-partitioning with a deterministic generator , 2009, Neurocomputing.

[84] Sylvain Arlot,et al. A survey of cross-validation procedures for model selection , 2009, 0907.4728.