Functional optimization by variable-basis approximation schemes

This is a summary of the author’s PhD thesis, supervised by Marcello Sanguineti and defended on April 2, 2009 at Università degli Studi di Genova. The thesis is written in English and a copy is available from the author upon request. Functional optimization problems arising in Operations Research are investigated. In such problems, a cost functional Φ has to be minimized over an admissible set S of d-variable functions. As, in general, closed-form solutions cannot be derived, suboptimal solutions are searched for, having the form of variable-basis functions, i.e., elements of the set spann  G of linear combinations of at most n elements from a set G of computational units. Upper bounds on $${\inf_{f \in S \cap {\rm span}_n\, G}\Phi(f)-\inf_{f \in S}\Phi(f)}$$ are obtained. Conditions are derived, under which the estimates do not exhibit the so-called “curse of dimensionality” in the number n of computational units, when the number d of variables grows. The problems considered include dynamic optimization, team optimization, and supervised learning from data.

[1]  Erich Novak,et al.  The Real Number Model in Numerical Analysis , 1995, J. Complex..

[2]  Marcello Sanguineti,et al.  Error bounds for suboptimal solutions to kernel principal component analysis , 2010, Optim. Lett..

[3]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[4]  Thomas Parisini,et al.  Neural strategies for nonlinear optimal filtering , 1992, [Proceedings 1992] IEEE International Conference on Systems Engineering.

[5]  Vladik Kreinovich,et al.  Estimates of the Number of Hidden Units and Variation with Respect to Half-Spaces , 1997, Neural Networks.

[6]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[7]  W. Alt On the approximation of infinite optimization problems with an application to optimal control problems , 1984 .

[8]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[9]  I. Singer Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces , 1970 .

[10]  Malte Sieveking,et al.  Critical debt and debt dynamics , 2000 .

[11]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[12]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[13]  B. Dacorogna Introduction to the calculus of variations , 2004 .

[14]  Pascal Frossard,et al.  Image coding using redundant dictionaries , 2006 .

[15]  G. Gnecco,et al.  Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions , 2008 .

[16]  A. Friedman Foundations of modern analysis , 1970 .

[17]  R. Radner,et al.  Team Decision Problems , 1962 .

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[20]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[21]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22]  G. Gnecco,et al.  Suboptimal Solutions to Dynamic Optimization Problems via Approximations of the Policy Functions , 2010 .

[23]  G. Lorentz Approximation of Functions , 1966 .

[24]  Marcello Sanguineti,et al.  Team optimization problems with Lipschitz continuous strategies , 2011, Optim. Lett..

[25]  Anders Rantzer,et al.  Using Game Theory for Distributed Control Engineering , 2008 .

[26]  Pierre Vandergheynst,et al.  On the exponential convergence of matching pursuits in quasi-incoherent dictionaries , 2006, IEEE Transactions on Information Theory.

[27]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[28]  Willem M. Nawijn Look-Ahead Policies for Admission to a Single Server Loss System , 1990, Oper. Res..

[29]  Joseph F. Traub,et al.  Complexity and information , 1999, Lezioni Lincee.

[30]  Henryk Wozniakowski,et al.  On the optimal convergence rate of universal and nonuniversal algorithms for multivariate integration and approximation , 2006, Math. Comput..

[31]  Marcello Sanguineti,et al.  Complexity of Gaussian-radial-basis networks approximating smooth functions , 2009, J. Complex..

[32]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[33]  Aloisio Araujo The once but not twice differentiability of the policy function , 1991 .

[34]  Willem M. Nawijn The Optimal Look-Ahead Policy for Admission to a Single Server System , 1985, Oper. Res..

[35]  Martin Burger,et al.  Error Bounds for Approximation with Neural Networks , 2001, J. Approx. Theory.

[36]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[37]  Marcello Sanguineti,et al.  Minimization of Error Functionals over Variable-Basis Functions , 2003, SIAM J. Optim..

[38]  W. E. Bosarge,et al.  The Ritz–Galerkin Procedure for Nonlinear Control Problems , 1973 .

[39]  Cristiano Cervellera,et al.  Design of Asymptotic Estimators: An Approach Based on Neural Networks and Nonlinear Programming , 2007, IEEE Transactions on Neural Networks.

[40]  Marcello Sanguineti,et al.  Suboptimal solutions to dynamic optimization problems: Extended Ritz method versus approximate dynamic programming , 2007 .

[41]  Thomas Parisini,et al.  Distributed-information neural control: the case of dynamic routing in traffic networks , 2001, IEEE Trans. Neural Networks.

[42]  Tamás D. Gedeon,et al.  Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm , 1998, IEEE Trans. Neural Networks.

[43]  Marcello Sanguineti,et al.  Structural Properties of Stochastic Dynamic Concave Optimization Problems and Approximations of the Value and Optimal Policy Functions , 2009 .

[44]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[45]  M. Sanguineti,et al.  Functional Optimal Estimation Problems and Their Solution by Nonlinear Approximation Schemes , 2007 .

[46]  J. Nash,et al.  NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[47]  Amit Gupta,et al.  Weight decay backpropagation for noisy data , 1998, Neural Networks.

[48]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[49]  E. Stein Singular Integrals and Di?erentiability Properties of Functions , 1971 .

[50]  G. Gnecco,et al.  Deriving Approximation Error Bounds via Rademacher’s Complexity and Learning Theory , 2007 .

[51]  O. SIAMJ.,et al.  Error Estimates for Approximate Optimization by the Extended Ritz Method , 2005, SIAM J. Optim..

[52]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[53]  Marcello Sanguineti,et al.  Accuracy of suboptimal solutions to kernel principal component analysis , 2009, Comput. Optim. Appl..

[54]  Marcello Sanguineti,et al.  On a Variational Norm Tailored to Variable-Basis Approximation Schemes , 2011, IEEE Transactions on Information Theory.

[55]  J. Ortega Numerical Analysis: A Second Course , 1974 .

[56]  Eduardo D. Sontag,et al.  Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[57]  Marcello Sanguineti,et al.  Rates of Minimization of Error Functionals over Boolean Variable-Basis Functions , 2005, J. Math. Model. Algorithms.

[58]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[59]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .

[60]  James Demmel,et al.  The geometry of III-conditioning , 1987, J. Complex..

[61]  Federico Girosi,et al.  Regularization Theory, Radial Basis Functions and Networks , 1994 .

[62]  Snehasis Mukhopadhyay,et al.  Adaptive control using neural networks and approximate models , 1997, IEEE Trans. Neural Networks.

[63]  Jeffrey Rauch Partial Differential Equations , 2018, Explorations in Numerical Analysis.

[64]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[65]  Martin D. Buhmann,et al.  Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[66]  Marcello Sanguineti,et al.  Regularization and Suboptimal Solutions in Learning from Data , 2009, Innovations in Neural Information Paradigms and Applications.

[67]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[68]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[69]  A. Pinkus n-Widths in Approximation Theory , 1985 .

[70]  Richard E. Korf,et al.  A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.

[71]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[72]  Joseph F. Traub,et al.  Information-based complexity and information-based optimization , 1999 .

[73]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[74]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[75]  Marcello Sanguineti,et al.  Approximate Minimization of the Regularized Expected Error over Kernel Models , 2008, Math. Oper. Res..

[76]  G. Gnecco,et al.  Estimates of Variation with Respect to a Set and Applications to Optimization Problems , 2010 .

[77]  Véra Kůrková,et al.  Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I , 2008, ICANN.

[78]  S. Marcus,et al.  Static team problems--Part I: Sufficient conditions and the exponential cost criterion , 1982 .

[79]  D. Gottlieb,et al.  Numerical analysis of spectral methods : theory and applications , 1977 .

[80]  Aad van der Vaart,et al.  The Cross-Validated Adaptive Epsilon-Net Estimator , 2006 .

[81]  Martin Burger,et al.  Training neural networks with noisy data as an ill-posed problem , 2000, Adv. Comput. Math..

[82]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[83]  Angelo Alessandri,et al.  A recursive algorithm for nonlinear least-squares problems , 2007, Comput. Optim. Appl..

[84]  Marcello Sanguineti,et al.  Geometric Upper Bounds on Rates of Variable-Basis Approximation , 2008, IEEE Transactions on Information Theory.

[85]  Marcello Sanguineti,et al.  Approximation Schemes for Functional Optimization Problems , 2008 .

[86]  C. Papadimitriou Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[87]  Christopher D. Sogge,et al.  Fourier Integrals in Classical Analysis , 1993 .

[88]  G. Gnecco,et al.  Value and Policy Function Approximations in Infinite-Horizon Optimization Problems , 2008 .

[89]  T. Sargent,et al.  Recursive Macroeconomic Theory , 2000 .

[90]  In-Ho Lee,et al.  Learning-by-Doing and the Choice of Technology: The Role of Patience , 2000, J. Econ. Theory.

[91]  Emile H. L. Aarts,et al.  Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[92]  Bernard Delyon,et al.  Nonlinear black-box models in system identification: Mathematical foundations , 1995, Autom..

[93]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[94]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[95]  Amit Gupta,et al.  The weight decay backpropagation for generalizations with missing values , 1998, Ann. Oper. Res..

[96]  C. W. Groetsch,et al.  Generalized inverses of linear operators , 1977 .

[97]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[98]  H. N. Mhaskar,et al.  Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[99]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[100]  Tamer Basar,et al.  Distributed algorithms for the computation of noncooperative equilibria , 1987, Autom..

[101]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[102]  Tommy W. S. Chow,et al.  Neural Networks and Computing - Learning Algorithms and Applications , 2007, Series in Electrical and Computer Engineering.

[103]  Andreas Hofinger Nonlinear function approximation: Computing smooth solutions with an adaptive greedy algorithm , 2006, J. Approx. Theory.

[104]  R. Radner,et al.  Economic theory of teams , 1972 .

[105]  Alain Rapaport,et al.  Optimality of greedy and sustainable policies in the management of renewable resources , 2003 .

[106]  Kate A. Smith,et al.  Neural Networks for Combinatorial Optimization: a Review of More Than a Decade of Research , 1999 .

[107]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[108]  L. Montrucchio Thompson metric, contraction property and differentiability of policy functions , 1998 .

[109]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[110]  S. Muthukrishnan,et al.  Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[111]  G. Gnecco,et al.  Computationally Efficient Approximation Schemes for Functional Optimization , 2008 .

[112]  A. A. Pervozvanskiĭ,et al.  Theory of Suboptimal Decisions: Decomposition and Aggregation , 1988 .

[113]  E. Davison,et al.  A decentralized discrete-time controller for dynamic routing , 1998 .

[114]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[115]  S. Vavasis Nonlinear optimization: complexity issues , 1991 .

[116]  Marcello Sanguineti,et al.  Comparison of worst case errors in linear and neural network approximation , 2002, IEEE Trans. Inf. Theory.

[117]  Marcello Sanguineti,et al.  Exploiting Structural Results in Approximate Dynamic Programming , 2007 .

[118]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[119]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[120]  D. Towsley,et al.  Optimal control of admission to a multi-server queue with two arrival streams , 1992 .

[121]  Cristiano Cervellera,et al.  Efficient sampling in approximate dynamic programming algorithms , 2007, Comput. Optim. Appl..

[122]  Giorgio Gnecco,et al.  The weight-decay technique in learning from data: an optimization point of view , 2009, Comput. Manag. Sci..

[123]  Christos H. Papadimitriou,et al.  Algorithms, games, and the internet , 2001, STOC '01.

[124]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[125]  Tamer Basar,et al.  The theory of teams: A selective annotated bibliography , 1989 .

[126]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[127]  Hector O. Fattorini,et al.  Infinite Dimensional Optimization and Control Theory: References , 1999 .

[128]  Gang George Yin Rates of Convergence for a Class of Global Stochastic Optimization Algorithms , 1999, SIAM J. Optim..

[129]  E. Parzen An Approach to Time Series Analysis , 1961 .

[130]  Marcello Sanguineti,et al.  The extended Ritz method for functional optimization: overview and applications to single-person and team optimal decision problems , 2009, Optim. Methods Softw..

[131]  Lennart Ljung,et al.  Nonlinear Black Box Modeling in System Identification , 1995 .

[132]  J. Marschak,et al.  Elements for a Theory of Teams , 1955 .

[133]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[134]  S. Mendelson,et al.  Entropy and the combinatorial dimension , 2002, math/0203275.

[135]  Charles A. Micchelli,et al.  Dimension-independent bounds on the degree of approximation by neural networks , 1994, IBM J. Res. Dev..

[136]  Marcello Sanguineti,et al.  Bounds on rates of variable-basis and neural-network approximation , 2001, IEEE Trans. Inf. Theory.

[137]  T. Yoshikawa Decomposition of dynamic team decision problems , 1978 .

[138]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[139]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[140]  Xiaoling Sun,et al.  Nonlinear Integer Programming , 2006 .

[141]  Strongly elliptic operators for a plane wave diffraction problem in Bessel potential spaces. , 2002 .

[142]  K. Miller Least Squares Methods for Ill-Posed Problems with a Prescribed Bound , 1970 .

[143]  Adrian Segall,et al.  The Modeling of Adaptive Routing in Data-Communication Networks , 1977, IEEE Trans. Commun..

[144]  Eduardo Sontag VC dimension of neural networks , 1998 .

[145]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[146]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[147]  M. Sanguineti,et al.  Approximating Networks and Extended Ritz Method for the Solution of Functional Optimization Problems , 2002 .

[148]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[149]  James W. Daniel The Ritz–Galerkin Method for Abstract Optimal Control Problems , 1973 .

[150]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[151]  M. Bertero Linear Inverse and III-Posed Problems , 1989 .

[152]  T. Poggio,et al.  Regression and Classification with Regularization , 2003 .

[153]  Juan Antonio Cuesta-Albertos,et al.  Some remarks on the condition number of a real random square matrix , 2003, J. Complex..

[154]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[155]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[156]  W. Rudin Real and complex analysis , 1968 .

[157]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[158]  S. Smale,et al.  On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[159]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[160]  Luigi Montrucchio,et al.  Lipschitz continuous policy functions for strongly concave optimization problems , 1987 .

[161]  V. V. Vasin Relationship of several variational methods for the approximate solution of ill-posed problems , 1970 .

[162]  T. Zolezzi Condition numbers and Ritz type methods in unconstrained optimization , 2007 .

[163]  A. Dontchev Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems , 1983 .

[164]  Thore Graepel,et al.  From Margin to Sparsity , 2000, NIPS.

[165]  H. Sirisena,et al.  Convergence of the control parameterization Ritz method for nonlinear optimal control problems , 1979 .

[166]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[167]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[168]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[169]  G. Wahba Spline models for observational data , 1990 .

[170]  David G. Stork,et al.  Pattern Classification , 1973 .

[171]  Claude E. Shannon,et al.  Programming a computer for playing chess , 1950 .

[172]  Marcello Sanguineti,et al.  Suboptimal solutions to network team optimization problems , 2009 .

[173]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[174]  W. Fleming,et al.  Deterministic and Stochastic Optimal Control , 1975 .

[175]  Henryk Wozniakowski,et al.  Information-based complexity , 1987, Nature.

[176]  Ferdinando A. Mussa-Ivaldi,et al.  Networks that approximate vector-valued mappings , 1993, IEEE International Conference on Neural Networks.

[177]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[178]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[179]  Angelo Alessandri,et al.  Minimizing Sequences for a Family of Functional Optimal Estimation Problems , 2010, J. Optim. Theory Appl..

[180]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[181]  Marcello Sanguineti,et al.  Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[182]  R. Zoppoli,et al.  Learning Techniques and Neural Networks for the Solution of N-Stage Nonlinear Nonquadratic Optimal Control Problems , 1992 .

[183]  Nancy L. Stokey,et al.  Recursive methods in economic dynamics , 1989 .

[184]  Hans S. Witsenhausen,et al.  Equivalent stochastic control problems , 1988, Math. Control. Signals Syst..

[185]  彰 五十嵐 N. Dunford and J. T. Schwartz (with the assistance of W. G. Bade and R. G. Bartle): Linear Operators. : Part II. Spectral Theoty. Self Adjoint Operators in Hilbert Space. Interscience. 1963. X+1065+7頁, 16×23.5cm, 14,000円。 , 1964 .

[186]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[187]  D. Serre Matrices: Theory and Applications , 2002 .

[188]  Stephen P. Brooks,et al.  Markov Decision Processes. , 1995 .

[189]  J. Patadia Local theorems for the absolute convergence of multiple lacunary Fourier series , 1985 .

[190]  William W. Hager The Ritz–Trefftz Method for State and Control Constrained Optimal Control Problems , 1975 .

[191]  Marcello Sanguineti,et al.  Learning with generalization capability by kernel methods of bounded complexity , 2005, J. Complex..

[192]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[193]  Richard Courant Differential and Integral Calculus (Volume II) , 1936 .

[194]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[195]  R. Bellman Dynamic programming. , 1957, Science.

[196]  I. Ekeland,et al.  Infinite-Dimensional Optimization And Convexity , 1983 .

[197]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.