Approximate solutions to markov decision processes
暂无分享,去创建一个
[1] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .
[2] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[3] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[4] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[5] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1962 .
[6] D. Blackwell. Discounted Dynamic Programming , 1965 .
[7] E. M. Wright,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[8] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.
[9] Gerbard Tintner,et al. Stochastic programming and stochastic control , 1975 .
[10] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[11] D. Gottlieb,et al. Numerical analysis of spectral methods : theory and applications , 1977 .
[12] Peter Dorato,et al. Dynamic programming and stochastic control , 1978 .
[13] P. Diaconis,et al. Conjugate Priors for Exponential Families , 1979 .
[14] J. F. C. Kingman,et al. Information and Exponential Families in Statistical Theory , 1980 .
[15] M. Kendall,et al. Kendall's advanced theory of statistics , 1995 .
[16] C. Watkins. Learning from delayed rewards , 1989 .
[17] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[18] John N. Tsitsiklis,et al. An optimal multigrid algorithm for discrete-time stochastic control , 1989 .
[19] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[20] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[21] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[22] Weiping Li,et al. Applied Nonlinear Control , 1991 .
[23] David Haussler,et al. How to use expert advice , 1993, STOC.
[24] A. Michael,et al. A Linear Programming Approach toSolving Stochastic Dynamic Programs , 1993 .
[25] C. J. Goh,et al. On the nonlinear optimal regulator problem , 1993, Autom..
[26] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[27] Michael A. Trick,et al. A Linear Programming Approach to Solving Stochastic Dynamic Programming , 1993 .
[28] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[29] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[30] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[31] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[32] Satinder Singh,et al. An Upper Bound on the Loss from Approximate Optimal-Value Functions , 1994 .
[33] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[34] Jean-Jacques E. Slotine,et al. Space-frequency localized basis function networks for nonlinear system estimation and control , 1995, Neurocomputing.
[35] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[36] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[37] Peter Dayan,et al. Improving Policies without Measuring Merits , 1995, NIPS.
[38] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[39] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[40] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[41] Jacek Gondzio,et al. Implementation of Interior Point Methods for Large Scale Linear Programming , 1996 .
[42] Scott Davies,et al. Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.
[43] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[44] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.
[45] Manfred K. Warmuth,et al. How to use expert advice , 1997, JACM.
[46] Eduardo Sontag,et al. Global stabilization of linear discrete-time systems with bounded feedback , 1997 .
[47] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[48] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[49] Stanley E. Zin,et al. SPLINE APPROXIMATIONS TO VALUE FUNCTIONS: Linear Programming Approach , 1997 .
[50] David Haussler,et al. Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.
[51] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.
[52] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[53] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[54] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[55] Geoffrey J. Gordon. Regret bounds for prediction problems , 1999, COLT '99.
[56] R. Vanderbei. LOQO:an interior point code for quadratic programming , 1999 .
[57] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..