论文信息 - Paper No . 171 A unified framework for Regularization Networks and Support Vector Machines - 字舞流文

Paper No . 171 A unified framework for Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. Copyright c © Massachusetts Institute of Technology, 1998 This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by the National Science Foundation under contract No. IIS-9800032, the Office of Naval Research under contract No. N0001493-1-0385 and contract No. N00014-95-1-0600. Partial support was also provided by Daimler-Benz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T.

T. Poggio | M. Pontil | T. Evgeniou

[1] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[2] Bernhard Schölkopf,et al. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[3] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[4] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[5] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[6] Erkki Oja,et al. The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[7] Christophe Rabut,et al. How to Build Quasi-Interpolants: Application to Polyharmonic B-Splines , 1991, Curves and Surfaces.

[8] Tomaso A. Poggio,et al. A Sparse Representation for Function Approximation , 1998, Neural Computation.

[9] Massimiliano Pontil,et al. A Note on Support Vector Machine Degeneracy , 1999, ALT.

[10] Léon Bottou,et al. Local Learning Algorithms , 1992, Neural Computation.

[11] Massimiliano Pontil,et al. On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[12] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[13] V. Hutson. Integral Equations , 1967, Nature.

[14] F. Girosi. Models of Noise and Robust Estimates , 1991 .

[15] D. Donoho,et al. Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[16] M. Bertero,et al. Ill-posed problems in early vision , 1988, Proc. IEEE.

[17] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[18] C. Rabut. AN INTRODUCTION TO SCHOENBERG'S APPROXIMATION , 1992 .

[19] R. Dudley. A course on empirical processes , 1984 .

[20] J. Stewart. Positive definite functions and generalizations, an historical survey , 1976 .

[21] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[22] Ingrid Daubechies,et al. Ten Lectures on Wavelets , 1992 .

[23] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.

[24] M. Buhmann. Multivariate cardinal interpolation with radial-basis functions , 1990 .

[25] N. Cristianini,et al. Robust Bounds on Generalization from the Margin Distribution , 1998 .

[26] Ronald R. Coifman,et al. Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[27] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[28] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[29] John Shawe-Taylor,et al. Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[30] J. C. BurgesChristopher. A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[31] B. Olshausen. Learning linear, sparse, factorial codes , 1996 .

[32] Massimiliano Pontil,et al. A Note on the Generalization Performance of Kernel Classifiers with Margin , 2000, ALT.

[33] B. Silverman,et al. Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[34] L. Goddard. Approximation of Functions , 1965, Nature.

[35] I. J. Schoenberg. Contributions to the problem of approximation of equidistant data by analytic functions. Part A. On the problem of smoothing or graduation. A first class of analytic approximation formulae , 1946 .

[36] Dr. M. G. Worster. Methods of Mathematical Physics , 1947, Nature.

[37] S. Rippa,et al. Numerical Procedures for Surface Fitting of Scattered Data by Radial Functions , 1986 .

[38] Federico Girosi,et al. On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[39] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[40] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[42] V. Ivanov,et al. The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[43] G. Wahba. A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[44] Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[45] Massimiliano Pontil,et al. From regression to classification in support vector machines , 1999, ESANN.

[46] I. J. Schoenberg,et al. Cardinal interpolation and spline functions , 1969 .

[47] Tomaso Poggio,et al. Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[48] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[49] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[50] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.

[51] M. Buhmann. On quasi-interpolation with radial basis functions , 1993 .

[52] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[53] L. Schumaker. Spline Functions: Basic Theory , 1981 .

[54] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[55] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[56] M. Bertero. Regularization methods for linear inverse problems , 1986 .

[57] H. Mhaskar. Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[58] David M. Allen,et al. The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[59] David Haussler,et al. Probabilistic kernel regression models , 1999, AISTATS.

[60] Olivier Chapelle,et al. Model Selection for Support Vector Machines , 1999, NIPS.

[61] Federico Girosi,et al. An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[62] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[63] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[64] W. Madych,et al. Polyharmonic cardinal splines: a minimization property , 1990 .

[65] C. D. Boor,et al. Quasiinterpolants and Approximation Power of Multivariate Splines , 1990 .

[66] A. Ron,et al. On multivariate approximation by integer translates of a basis function , 1992 .

[67] R W Prager,et al. Development of low entropy coding in a recurrent network. , 1996, Network.

[68] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[69] J. A. Cochran. The analysis of linear integral equations , 1973 .

[70] R. Dudley,et al. Uniform and universal Glivenko-Cantelli classes , 1991 .

[71] Terrence J. Sejnowski,et al. Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[72] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[73] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[74] Tomaso A. Poggio,et al. Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[75] Andrzej Cichocki,et al. A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[76] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[77] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .