论文信息 - Regularization Networks and Support Vector Machines - 字舞流文

Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular, the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.

Tomaso A. Poggio | Massimiliano Pontil | Theodoros Evgeniou | T. Poggio | M. Pontil | T. Evgeniou

[1] I. J. Schoenberg. Contributions to the problem of approximation of equidistant data by analytic functions. Part A. On the problem of smoothing or graduation. A first class of analytic approximation formulae , 1946 .

[2] Dr. M. G. Worster. Methods of Mathematical Physics , 1947, Nature.

[3] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[4] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[5] L. Goddard. Approximation of Functions , 1965, Nature.

[6] V. Hutson. Integral Equations , 1967, Nature.

[7] I. J. Schoenberg,et al. Cardinal interpolation and spline functions , 1969 .

[8] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[9] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[10] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11] J. A. Cochran. The analysis of linear integral equations , 1973 .

[12] David M. Allen,et al. The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[13] J. Stewart. Positive definite functions and generalizations, an historical survey , 1976 .

[14] V. Ivanov,et al. The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[15] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[16] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[17] L. Schumaker. Spline Functions: Basic Theory , 1981 .

[18] J. Jerome. Review: Larry L. Schumaker, Spline functions: Basic theory , 1982 .

[19] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[20] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[21] R. Dudley. A course on empirical processes , 1984 .

[22] B. Silverman,et al. Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[23] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.

[24] G. Wahba. A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[25] S. Rippa,et al. Numerical Procedures for Surface Fitting of Scattered Data by Radial Functions , 1986 .

[26] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[27] M. Bertero. Regularization methods for linear inverse problems , 1986 .

[28] Tomaso Poggio,et al. Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[29] M. Bertero,et al. Ill-posed problems in early vision , 1988, Proc. IEEE.

[30] I. J. Schoenberg. Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions , 1988 .

[31] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[32] M. Buhmann. Multivariate cardinal interpolation with radial-basis functions , 1990 .

[33] W. Härdle. Applied Nonparametric Regression , 1991 .

[34] Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[35] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[36] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[37] W. Madych,et al. Polyharmonic cardinal splines: a minimization property , 1990 .

[38] C. D. Boor,et al. Quasiinterpolants and Approximation Power of Multivariate Splines , 1990 .

[39] R. Tibshirani,et al. Generalized Additive Models , 1991 .

[40] Christophe Rabut,et al. How to Build Quasi-Interpolants: Application to Polyharmonic B-Splines , 1991, Curves and Surfaces.

[41] F. Girosi. Models of Noise and Robust Estimates , 1991 .

[42] F. Girosi. Models of Noise and Robust Estimation , 1991 .

[43] R. Dudley,et al. Uniform and universal Glivenko-Cantelli classes , 1991 .

[44] Léon Bottou,et al. Local Learning Algorithms , 1992, Neural Computation.

[45] C. Rabut. AN INTRODUCTION TO SCHOENBERG'S APPROXIMATION , 1992 .

[46] Ingrid Daubechies,et al. Ten Lectures on Wavelets , 1992 .

[47] Ronald R. Coifman,et al. Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[48] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[49] A. Ron,et al. On multivariate approximation by integer translates of a basis function , 1992 .

[50] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[51] M. Buhmann. On quasi-interpolation with radial basis functions , 1993 .

[52] H. Mhaskar. Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[53] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[54] D. Donoho,et al. Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[55] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[56] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[57] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[58] Andrzej Cichocki,et al. A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[59] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[60] B. Olshausen. Learning linear, sparse, factorial codes , 1996 .

[61] Federico Girosi,et al. On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[62] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[63] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[64] R W Prager,et al. Development of low entropy coding in a recurrent network. , 1996, Network.

[65] Erkki Oja,et al. The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[66] Bernhard Schölkopf,et al. Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[67] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[68] Vladimir Cherkassky,et al. The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[69] Terrence J. Sejnowski,et al. Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[70] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[71] Bernhard Schölkopf,et al. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[72] Tomaso A. Poggio,et al. A Sparse Representation for Function Approximation , 1998, Neural Computation.

[73] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[74] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[75] D. Mackay,et al. Introduction to Gaussian processes , 1998 .

[76] N. Cristianini,et al. Robust Bounds on Generalization from the Margin Distribution , 1998 .

[77] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[78] John Shawe-Taylor,et al. Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[79] J. C. BurgesChristopher. A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[80] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[81] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.

[82] Federico Girosi,et al. An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[83] Tomaso Poggio,et al. Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[84] Massimiliano Pontil,et al. A Note on Support Vector Machine Degeneracy , 1999, ALT.

[85] A. J. Bell,et al. A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[86] Massimiliano Pontil,et al. From regression to classification in support vector machines , 1999, ESANN.

[87] M. Pontil,et al. From Regression to Classication in Support Vector Machines , 1999 .

[88] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[89] Massimiliano Pontil,et al. On the Vgamma Dimension for Regression in Reproducing Kernel Hilbert Spaces , 1999, ALT.

[90] David Haussler,et al. Probabilistic kernel regression models , 1999, AISTATS.

[91] Olivier Chapelle,et al. Model Selection for Support Vector Machines , 1999, NIPS.

[92] Massimiliano Pontil,et al. On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[93] Massimiliano Pontil,et al. A Note on the Generalization Performance of Kernel Classifiers with Margin , 2000, ALT.

[94] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[95] Tomaso A. Poggio,et al. Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[96] Robert A. Lordo,et al. Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[97] Bernhard Schölkopf,et al. Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[98] F. Girosi,et al. On the Relationship between Generalization Error , Hypothesis NG 1879 Complexity , and Sample Complexity for Radial Basis Functions N 00014-92-J-1879 6 , 2022 .