Least Squares Solutions of the HJB Equation With Neural Network Value-Function Approximators
暂无分享,去创建一个
Yuval Tassa | Tom Erez | T. Erez | Yuval Tassa | Tom Erez
[1] J. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals , 1960 .
[2] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .
[3] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[4] George N. Saridis,et al. An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[5] P. Lions,et al. Viscosity solutions of Hamilton-Jacobi equations , 1983 .
[6] Pineda,et al. Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.
[7] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[8] Kumpati S. Narendra,et al. Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.
[9] Bernard Widrow,et al. Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[10] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.
[11] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[12] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .
[13] C. J. Goh,et al. On the nonlinear optimal regulator problem , 1993, Autom..
[14] Mokhtar S. Bazaraa,et al. Nonlinear Programming: Theory and Algorithms , 1993 .
[15] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[16] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[17] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[18] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[19] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[20] S. Lyashevskiy,et al. Control system analysis and design upon the Lyapunov method , 1995, Proceedings of 1995 American Control Conference - ACC'95.
[21] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[22] Jun-Ho Oh,et al. Hybrid Learning of Mapping and its Jacobian in Multilayer Neural Networks , 1996, Neural Computation.
[23] Randal W. Bea. Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .
[24] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[25] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[26] Andrew W. Moore,et al. Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).
[27] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[28] James A. Sethian,et al. Level Set Methods and Fast Marching Methods , 1999 .
[29] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[30] Ryusuke Masuoka,et al. Neural Networks Learning Differential Data , 2000 .
[31] Michail G. Lagoudakis,et al. Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.
[32] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .
[33] Fernando Pérez-Cruz,et al. Support Vector Regression for the simultaneous learning of a multivariate function and its derivatives , 2005, Neurocomputing.
[34] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[35] Frank L. Lewis,et al. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..
[36] Katta G. Murty,et al. Nonlinear Programming Theory and Algorithms , 2007, Technometrics.