论文信息 - Functional optimization by variable-basis approximation schemes - 字舞流文

Functional optimization by variable-basis approximation schemes

This is a summary of the author’s PhD thesis, supervised by Marcello Sanguineti and defended on April 2, 2009 at Università degli Studi di Genova. The thesis is written in English and a copy is available from the author upon request. Functional optimization problems arising in Operations Research are investigated. In such problems, a cost functional Φ has to be minimized over an admissible set S of d-variable functions. As, in general, closed-form solutions cannot be derived, suboptimal solutions are searched for, having the form of variable-basis functions, i.e., elements of the set spann G of linear combinations of at most n elements from a set G of computational units. Upper bounds on $${\inf_{f \in S \cap {\rm span}_n\, G}\Phi(f)-\inf_{f \in S}\Phi(f)}$$ are obtained. Conditions are derived, under which the estimates do not exhibit the so-called “curse of dimensionality” in the number n of computational units, when the number d of variables grows. The problems considered include dynamic optimization, team optimization, and supervised learning from data.

Giorgio Gnecco | G. Gnecco

[1] Erich Novak,et al. The Real Number Model in Numerical Analysis , 1995, J. Complex..

[2] Marcello Sanguineti,et al. Error bounds for suboptimal solutions to kernel principal component analysis , 2010, Optim. Lett..

[3] Kurt Hornik,et al. Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[4] Thomas Parisini,et al. Neural strategies for nonlinear optimal filtering , 1992, [Proceedings 1992] IEEE International Conference on Systems Engineering.

[5] Vladik Kreinovich,et al. Estimates of the Number of Hidden Units and Variation with Respect to Half-Spaces , 1997, Neural Networks.

[6] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[7] W. Alt. On the approximation of infinite optimization problems with an application to optimal control problems , 1984 .

[8] L. Jones. A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[9] I. Singer. Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces , 1970 .

[10] Malte Sieveking,et al. Critical debt and debt dynamics , 2000 .

[11] T. Poggio,et al. The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[12] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[13] B. Dacorogna. Introduction to the calculus of variations , 2004 .

[14] Pascal Frossard,et al. Image coding using redundant dictionaries , 2006 .

[15] G. Gnecco,et al. Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions , 2008 .

[16] A. Friedman. Foundations of modern analysis , 1970 .

[17] R. Radner,et al. Team Decision Problems , 1962 .

[18] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[20] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[21] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22] G. Gnecco,et al. Suboptimal Solutions to Dynamic Optimization Problems via Approximations of the Policy Functions , 2010 .

[23] G. Lorentz. Approximation of Functions , 1966 .

[24] Marcello Sanguineti,et al. Team optimization problems with Lipschitz continuous strategies , 2011, Optim. Lett..

[25] Anders Rantzer,et al. Using Game Theory for Distributed Control Engineering , 2008 .

[26] Pierre Vandergheynst,et al. On the exponential convergence of matching pursuits in quasi-incoherent dictionaries , 2006, IEEE Transactions on Information Theory.

[27] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[28] Willem M. Nawijn. Look-Ahead Policies for Admission to a Single Server Loss System , 1990, Oper. Res..

[29] Joseph F. Traub,et al. Complexity and information , 1999, Lezioni Lincee.

[30] Henryk Wozniakowski,et al. On the optimal convergence rate of universal and nonuniversal algorithms for multivariate integration and approximation , 2006, Math. Comput..

[31] Marcello Sanguineti,et al. Complexity of Gaussian-radial-basis networks approximating smooth functions , 2009, J. Complex..

[32] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[33] Aloisio Araujo. The once but not twice differentiability of the policy function , 1991 .

[34] Willem M. Nawijn. The Optimal Look-Ahead Policy for Admission to a Single Server System , 1985, Oper. Res..

[35] Martin Burger,et al. Error Bounds for Approximation with Neural Networks , 2001, J. Approx. Theory.

[36] F. Clarke. Optimization And Nonsmooth Analysis , 1983 .

[37] Marcello Sanguineti,et al. Minimization of Error Functionals over Variable-Basis Functions , 2003, SIAM J. Optim..

[38] W. E. Bosarge,et al. The Ritz–Galerkin Procedure for Nonlinear Control Problems , 1973 .

[39] Cristiano Cervellera,et al. Design of Asymptotic Estimators: An Approach Based on Neural Networks and Nonlinear Programming , 2007, IEEE Transactions on Neural Networks.

[40] Marcello Sanguineti,et al. Suboptimal solutions to dynamic optimization problems: Extended Ritz method versus approximate dynamic programming , 2007 .

[41] Thomas Parisini,et al. Distributed-information neural control: the case of dynamic routing in traffic networks , 2001, IEEE Trans. Neural Networks.

[42] Tamás D. Gedeon,et al. Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm , 1998, IEEE Trans. Neural Networks.

[43] Marcello Sanguineti,et al. Structural Properties of Stochastic Dynamic Concave Optimization Problems and Approximations of the Value and Optimal Policy Functions , 2009 .

[44] Tong Zhang,et al. Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[45] M. Sanguineti,et al. Functional Optimal Estimation Problems and Their Solution by Nonlinear Approximation Schemes , 2007 .

[46] J. Nash,et al. NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[47] Amit Gupta,et al. Weight decay backpropagation for noisy data , 1998, Neural Networks.

[48] R. A. Silverman,et al. Introductory Real Analysis , 1972 .

[49] E. Stein. Singular Integrals and Di?erentiability Properties of Functions , 1971 .

[50] G. Gnecco,et al. Deriving Approximation Error Bounds via Rademacher’s Complexity and Learning Theory , 2007 .

[51] O. SIAMJ.,et al. Error Estimates for Approximate Optimization by the Extended Ritz Method , 2005, SIAM J. Optim..

[52] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .

[53] Marcello Sanguineti,et al. Accuracy of suboptimal solutions to kernel principal component analysis , 2009, Comput. Optim. Appl..

[54] Marcello Sanguineti,et al. On a Variational Norm Tailored to Variable-Basis Approximation Schemes , 2011, IEEE Transactions on Information Theory.

[55] J. Ortega. Numerical Analysis: A Second Course , 1974 .

[56] Eduardo D. Sontag,et al. Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[57] Marcello Sanguineti,et al. Rates of Minimization of Error Functionals over Boolean Variable-Basis Functions , 2005, J. Math. Model. Algorithms.

[58] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[59] G. Gnecco,et al. Approximation Error Bounds via Rademacher's Complexity , 2008 .

[60] James Demmel,et al. The geometry of III-conditioning , 1987, J. Complex..

[61] Federico Girosi,et al. Regularization Theory, Radial Basis Functions and Networks , 1994 .

[62] Snehasis Mukhopadhyay,et al. Adaptive control using neural networks and approximate models , 1997, IEEE Trans. Neural Networks.

[63] Jeffrey Rauch. Partial Differential Equations , 2018, Explorations in Numerical Analysis.

[64] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[65] Martin D. Buhmann,et al. Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[66] Marcello Sanguineti,et al. Regularization and Suboptimal Solutions in Learning from Data , 2009, Innovations in Neural Information Paradigms and Applications.

[67] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[68] I. J. Schoenberg. Metric spaces and completely monotone functions , 1938 .

[69] A. Pinkus. n-Widths in Approximation Theory , 1985 .

[70] Richard E. Korf,et al. A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.

[71] Harald Niederreiter,et al. Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[72] Joseph F. Traub,et al. Information-based complexity and information-based optimization , 1999 .

[73] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .

[74] S. Mallat,et al. Adaptive greedy approximations , 1997 .

[75] Marcello Sanguineti,et al. Approximate Minimization of the Regularized Expected Error over Kernel Models , 2008, Math. Oper. Res..

[76] G. Gnecco,et al. Estimates of Variation with Respect to a Set and Applications to Optimization Problems , 2010 .

[77] Véra Kůrková,et al. Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I , 2008, ICANN.

[78] S. Marcus,et al. Static team problems--Part I: Sufficient conditions and the exponential cost criterion , 1982 .

[79] D. Gottlieb,et al. Numerical analysis of spectral methods : theory and applications , 1977 .

[80] Aad van der Vaart,et al. The Cross-Validated Adaptive Epsilon-Net Estimator , 2006 .

[81] Martin Burger,et al. Training neural networks with noisy data as an ill-posed problem , 2000, Adv. Comput. Math..

[82] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[83] Angelo Alessandri,et al. A recursive algorithm for nonlinear least-squares problems , 2007, Comput. Optim. Appl..

[84] Marcello Sanguineti,et al. Geometric Upper Bounds on Rates of Variable-Basis Approximation , 2008, IEEE Transactions on Information Theory.

[85] Marcello Sanguineti,et al. Approximation Schemes for Functional Optimization Problems , 2008 .

[86] C. Papadimitriou. Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[87] Christopher D. Sogge,et al. Fourier Integrals in Classical Analysis , 1993 .

[88] G. Gnecco,et al. Value and Policy Function Approximations in Infinite-Horizon Optimization Problems , 2008 .

[89] T. Sargent,et al. Recursive Macroeconomic Theory , 2000 .

[90] In-Ho Lee,et al. Learning-by-Doing and the Choice of Technology: The Role of Patience , 2000, J. Econ. Theory.

[91] Emile H. L. Aarts,et al. Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[92] Bernard Delyon,et al. Nonlinear black-box models in system identification: Mathematical foundations , 1995, Autom..

[93] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[94] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .

[95] Amit Gupta,et al. The weight decay backpropagation for generalizations with missing values , 1998, Ann. Oper. Res..

[96] C. W. Groetsch,et al. Generalized inverses of linear operators , 1977 .

[97] Shahar Mendelson,et al. A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[98] H. N. Mhaskar,et al. Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[99] Joel A. Tropp,et al. Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[100] Tamer Basar,et al. Distributed algorithms for the computation of noncooperative equilibria , 1987, Autom..

[101] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[102] Tommy W. S. Chow,et al. Neural Networks and Computing - Learning Algorithms and Applications , 2007, Series in Electrical and Computer Engineering.

[103] Andreas Hofinger. Nonlinear function approximation: Computing smooth solutions with an adaptive greedy algorithm , 2006, J. Approx. Theory.

[104] R. Radner,et al. Economic theory of teams , 1972 .

[105] Alain Rapaport,et al. Optimality of greedy and sustainable policies in the management of renewable resources , 2003 .

[106] Kate A. Smith,et al. Neural Networks for Combinatorial Optimization: a Review of More Than a Decade of Research , 1999 .

[107] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[108] L. Montrucchio. Thompson metric, contraction property and differentiability of policy functions , 1998 .

[109] M. Talagrand. Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[110] S. Muthukrishnan,et al. Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[111] G. Gnecco,et al. Computationally Efficient Approximation Schemes for Functional Optimization , 2008 .

[112] A. A. Pervozvanskiĭ,et al. Theory of Suboptimal Decisions: Decomposition and Aggregation , 1988 .

[113] E. Davison,et al. A decentralized discrete-time controller for dynamic routing , 1998 .

[114] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[115] S. Vavasis. Nonlinear optimization: complexity issues , 1991 .

[116] Marcello Sanguineti,et al. Comparison of worst case errors in linear and neural network approximation , 2002, IEEE Trans. Inf. Theory.

[117] Marcello Sanguineti,et al. Exploiting Structural Results in Approximate Dynamic Programming , 2007 .

[118] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[119] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[120] D. Towsley,et al. Optimal control of admission to a multi-server queue with two arrival streams , 1992 .

[121] Cristiano Cervellera,et al. Efficient sampling in approximate dynamic programming algorithms , 2007, Comput. Optim. Appl..

[122] Giorgio Gnecco,et al. The weight-decay technique in learning from data: an optimization point of view , 2009, Comput. Manag. Sci..

[123] Christos H. Papadimitriou,et al. Algorithms, games, and the internet , 2001, STOC '01.

[124] C. Berg,et al. Harmonic Analysis on Semigroups , 1984 .

[125] Tamer Basar,et al. The theory of teams: A selective annotated bibliography , 1989 .

[126] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[127] Hector O. Fattorini,et al. Infinite Dimensional Optimization and Control Theory: References , 1999 .

[128] Gang George Yin. Rates of Convergence for a Class of Global Stochastic Optimization Algorithms , 1999, SIAM J. Optim..

[129] E. Parzen. An Approach to Time Series Analysis , 1961 .

[130] Marcello Sanguineti,et al. The extended Ritz method for functional optimization: overview and applications to single-person and team optimal decision problems , 2009, Optim. Methods Softw..

[131] Lennart Ljung,et al. Nonlinear Black Box Modeling in System Identification , 1995 .

[132] J. Marschak,et al. Elements for a Theory of Teams , 1955 .

[133] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[134] S. Mendelson,et al. Entropy and the combinatorial dimension , 2002, math/0203275.

[135] Charles A. Micchelli,et al. Dimension-independent bounds on the degree of approximation by neural networks , 1994, IBM J. Res. Dev..

[136] Marcello Sanguineti,et al. Bounds on rates of variable-basis and neural-network approximation , 2001, IEEE Trans. Inf. Theory.

[137] T. Yoshikawa. Decomposition of dynamic team decision problems , 1978 .

[138] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[139] Felipe Cucker,et al. Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[140] Xiaoling Sun,et al. Nonlinear Integer Programming , 2006 .

[141] Strongly elliptic operators for a plane wave diffraction problem in Bessel potential spaces. , 2002 .

[142] K. Miller. Least Squares Methods for Ill-Posed Problems with a Prescribed Bound , 1970 .

[143] Adrian Segall,et al. The Modeling of Adaptive Routing in Data-Communication Networks , 1977, IEEE Trans. Commun..

[144] Eduardo Sontag. VC dimension of neural networks , 1998 .

[145] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[146] D. Pollard. Empirical Processes: Theory and Applications , 1990 .

[147] M. Sanguineti,et al. Approximating Networks and Extended Ritz Method for the Solution of Functional Optimization Problems , 2002 .

[148] Ding-Xuan Zhou,et al. Learning Theory: An Approximation Theory Viewpoint , 2007 .

[149] James W. Daniel. The Ritz–Galerkin Method for Abstract Optimal Control Problems , 1973 .

[150] P. Massart,et al. About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[151] M. Bertero. Linear Inverse and III-Posed Problems , 1989 .

[152] T. Poggio,et al. Regression and Classification with Regularization , 2003 .

[153] Juan Antonio Cuesta-Albertos,et al. Some remarks on the condition number of a real random square matrix , 2003, J. Complex..

[154] Albert B Novikoff,et al. ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[155] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[156] W. Rudin. Real and complex analysis , 1968 .

[157] H. Engl,et al. Regularization of Inverse Problems , 1996 .

[158] S. Smale,et al. On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[159] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[160] Luigi Montrucchio,et al. Lipschitz continuous policy functions for strongly concave optimization problems , 1987 .

[161] V. V. Vasin. Relationship of several variational methods for the approximate solution of ill-posed problems , 1970 .

[162] T. Zolezzi. Condition numbers and Ritz type methods in unconstrained optimization , 2007 .

[163] A. Dontchev. Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems , 1983 .

[164] Thore Graepel,et al. From Margin to Sparsity , 2000, NIPS.

[165] H. Sirisena,et al. Convergence of the control parameterization Ritz method for nonlinear optimal control problems , 1979 .

[166] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[167] Dimitri P. Bertsekas,et al. Convex Analysis and Optimization , 2003 .

[168] Federico Girosi,et al. An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[169] G. Wahba. Spline models for observational data , 1990 .

[170] David G. Stork,et al. Pattern Classification , 1973 .

[171] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .

[172] Marcello Sanguineti,et al. Suboptimal solutions to network team optimization problems , 2009 .

[173] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[174] W. Fleming,et al. Deterministic and Stochastic Optimal Control , 1975 .

[175] Henryk Wozniakowski,et al. Information-based complexity , 1987, Nature.

[176] Ferdinando A. Mussa-Ivaldi,et al. Networks that approximate vector-valued mappings , 1993, IEEE International Conference on Neural Networks.

[177] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[178] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[179] Angelo Alessandri,et al. Minimizing Sequences for a Family of Functional Optimal Estimation Problems , 2010, J. Optim. Theory Appl..

[180] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[181] Marcello Sanguineti,et al. Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[182] R. Zoppoli,et al. Learning Techniques and Neural Networks for the Solution of N-Stage Nonlinear Nonquadratic Optimal Control Problems , 1992 .

[183] Nancy L. Stokey,et al. Recursive methods in economic dynamics , 1989 .

[184] Hans S. Witsenhausen,et al. Equivalent stochastic control problems , 1988, Math. Control. Signals Syst..

[185] 彰五十嵐. N. Dunford and J. T. Schwartz (with the assistance of W. G. Bade and R. G. Bartle): Linear Operators. : Part II. Spectral Theoty. Self Adjoint Operators in Hilbert Space. Interscience. 1963. X+1065+7頁, 16×23.5cm, 14,000円。 , 1964 .

[186] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[187] D. Serre. Matrices: Theory and Applications , 2002 .

[188] Stephen P. Brooks,et al. Markov Decision Processes. , 1995 .

[189] J. Patadia. Local theorems for the absolute convergence of multiple lacunary Fourier series , 1985 .

[190] William W. Hager. The Ritz–Trefftz Method for State and Control Constrained Optimal Control Problems , 1975 .

[191] Marcello Sanguineti,et al. Learning with generalization capability by kernel methods of bounded complexity , 2005, J. Complex..

[192] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[193] Richard Courant. Differential and Integral Calculus (Volume II) , 1936 .

[194] Leo Breiman,et al. Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[195] R. Bellman. Dynamic programming. , 1957, Science.

[196] I. Ekeland,et al. Infinite-Dimensional Optimization And Convexity , 1983 .

[197] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.