On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition

The paper brings together methods from two disciplines: machine learning theory and robust statistics. We argue that robustness is an important aspect and we show that many existing machine learning methods based on the convex risk minimization principle have - besides other good properties - also the advantage of being robust. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds on the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. Some results on the robustness of such methods are also obtained for the sensitivity curve and the maxbias, which are two other robustness criteria. A sensitivity analysis of the support vector machine is given.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[3]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  H. Rieder Robust asymptotic statistics , 1994 .

[6]  Andreas Christmann,et al.  Least median of weighted squares in logistic regression with large strata , 1994 .

[7]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[8]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[9]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[10]  Ulrich Güntzer,et al.  Data Quality Mining - Making a Virute of Necessity , 2001, DMKD.

[11]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[12]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[13]  J. Retherford Review: J. Diestel and J. J. Uhl, Jr., Vector measures , 1978 .

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[16]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[17]  J. Diestel,et al.  On vector measures , 1974 .

[18]  F. Browder Nonlinear functional analysis , 1970 .

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[21]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[22]  Andreas Christmann,et al.  Measuring overlap in logistic regression , 1999 .

[23]  E. Zeidler Nonlinear functional analysis and its applications , 1988 .

[24]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[25]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[26]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[27]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[28]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[29]  Thorsten Joachims,et al.  Comparison between various regression depth methods and the support vector machine to approximate the minimum number of missclassifications , 2002, Comput. Stat..

[30]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[31]  Regina Y. Liu,et al.  Regression depth. Commentaries. Rejoinder , 1999 .

[32]  E. Cheney Analysis for Applied Mathematics , 2001 .

[33]  Johan A. K. Suykens,et al.  Weighted least squares support vector machines: robustness and sparse approximation , 2002, Neurocomputing.

[34]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[35]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[36]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[37]  Andreas Christmann On a Strategy to Develop Robust and Simple Tariffs from Motor Vehicle Insurance Data , 2004 .

[38]  Andreas Christmann,et al.  Measuring overlap in binary regression , 2001 .

[39]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[40]  P. L. Davies Aspects of Robust Linear Regression , 1993 .

[41]  吉野 崇,et al.  Introduction to operator theory , 1993 .

[42]  Heinz Bauer,et al.  Maß- und Integrationstheorie , 1992 .

[43]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[44]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[45]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[46]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[47]  P. L. Davies,et al.  The asymptotics of S-estimators in the linear regression model , 1990 .

[48]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[49]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.