The Nature of Statistical Learning Theory

Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

[1]  F. Yates Contributions to Mathematical Statistics , 1951, Nature.

[2]  Le Cam,et al.  On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates , 1953 .

[3]  T. W. Anderson,et al.  Classification into two Multivariate Normal Distributions with Different Covariance Matrices , 1962 .

[4]  David L. Phillips,et al.  A Technique for the Numerical Solution of Certain Integral Equations of the First Kind , 1962, JACM.

[5]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[6]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[7]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[8]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[9]  H. Akaike Statistical predictor identification , 1970 .

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[12]  E. Wegman Nonparametric probability density estimation , 1972 .

[13]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[16]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[17]  V. Vapnik,et al.  Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[18]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[19]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[20]  D. Rumelhart Learning Internal Representations by Error Propagation, Parallel Distributed Processing , 1986 .

[21]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[22]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[23]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[25]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[26]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[27]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[28]  A. Atkinson Subset Selection in Regression , 1992 .

[29]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[30]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[31]  J. Parrondo,et al.  Vapnik-Chervonenkis bounds for generalization , 1993 .

[32]  Léon Bottou,et al.  Local Algorithms for Pattern Recognition and Dependencies Estimation , 1993, Neural Computation.

[33]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[34]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[35]  V. Vapnik Three fundamental concepts of the capacity of learning machines , 1993 .

[36]  Marek Karpinski,et al.  VC Dimension and Uniform Learnability of Sparse Polynomials and Rational Functions , 1993, SIAM J. Comput..

[37]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[38]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[42]  R. Solomonoff A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE , 2001 .