On Stochastic Optimization and Statistical Learning in Reproducing Kernel Hilbert Spaces by Support Vector Machines (SVM)

The paper studies stochastic optimization problems in Reproducing Kernel Hilbert Spaces (RKHS). The objective function of such problems is a mathematical expectation functional depending on decision rules (or strategies), i.e. on functions of observed random parameters. Feasible rules are restricted to belong to a RKHS. This kind of problems arises in on-line decision making and in statistical learning theory. We solve the problem by sample average approximation combined with Tihonov's regularization and establish sufficient conditions for uniform convergence of approximate solutions with probability one, jointly with a rule for downward adjustment of the regularization factor with increasing sample size.

[1]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[2]  R. Wets,et al.  Stochastic programming , 1989 .

[3]  Alexander J. Smola,et al.  Nonparametric Quantile Estimation , 2006, J. Mach. Learn. Res..

[4]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[5]  Vladimir I. Norkin,et al.  On Convergence of Kernel Learning Estimators , 2009, SIAM J. Optim..

[6]  Matjaz B. Juric,et al.  Assessment of Classification Models with Small Amounts of Data , 2007, Informatica.

[7]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[8]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[9]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[10]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[11]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[12]  Yuan Yao,et al.  Online Learning Algorithms , 2006, Found. Comput. Math..

[13]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[14]  S. Fomin,et al.  Elements of the Theory of Functions and Functional Analysis , 1961 .

[15]  I. Ekeland,et al.  Convex analysis and variational problems , 1976 .

[16]  I. V. Sergienko,et al.  Bayesian approach, theory of empirical risk minimization. Comparative analysis , 2008 .

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  M. Keyzer Rule-based and support vector (SV-)regression/classification algorithms for joint processing of census, map, survey and district data. , 2005 .

[19]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[20]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[21]  Peter Kall,et al.  Stochastic Programming , 1995 .

[22]  Yuri Ermoliev,et al.  Some proposals for stochastic facility location models , 1982 .

[23]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[24]  A. I. Yastremskii Optimality conditions in stochastic programming , 1980 .

[25]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[28]  R. Koenker,et al.  Regression Quantiles , 2007 .