The Linear Programming Approach to Approximate Dynamic Programming
暂无分享,去创建一个
[1] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[2] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[3] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[4] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[5] D. Luenberger. Optimization by Vector Space Methods , 1968 .
[6] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .
[7] E. Denardo. On Linear Programming in a Markov Decision Problem , 1970 .
[8] R. Dudley. Central Limit Theorems for Empirical Measures , 1978 .
[9] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .
[10] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[11] V. Borkar. A convex analytic approach to Markov decision processes , 1988 .
[12] David Haussler,et al. Equivalence of models for polynomial learnability , 1988, COLT '88.
[13] P. R. Kumar,et al. Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[14] R. Durrett. Probability: Theory and Examples , 1993 .
[15] Martin Grötschel,et al. Solution of large-scale symmetric travelling salesman problems , 1991, Math. Program..
[16] A. Michael,et al. A Linear Programming Approach toSolving Stochastic Dynamic Programs , 1993 .
[17] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[18] Sean P. Meyn,et al. Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.
[19] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[20] Kenneth L. Clarkson,et al. Las Vegas algorithms for linear and integer programming when the dimension is small , 1995, JACM.
[21] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[22] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[23] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[24] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[25] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[26] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[27] Mathukumalli Vidyasagar,et al. A Theory of Learning and Generalization , 1997 .
[28] Stanley E. Zin,et al. SPLINE APPROXIMATIONS TO VALUE FUNCTIONS: Linear Programming Approach , 1997 .
[29] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[30] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[31] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[32] Christine A. Shoemaker,et al. Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..
[33] J. R. Morrison,et al. New Linear Program Performance Bounds for Queueing Networks , 1999 .
[34] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[35] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[36] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[37] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[38] Sean P. Meyn,et al. Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..
[39] On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning , 2000 .
[40] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[41] John N. Tsitsiklis,et al. Congestion-dependent pricing of network services , 2000, TNET.
[42] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[43] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[44] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[45] A. W. van der Vaart,et al. Uniform Central Limit Theorems , 2001 .
[46] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[47] Sean P. Meyn. Sequencing and Routing in Multiclass Queueing Networks Part I: Feedback Regulation , 2001, SIAM J. Control. Optim..
[48] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.
[49] J. Tsitsiklis,et al. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions , 2001 .
[50] Mark S. Squillante,et al. On maximizing service-level-agreement profits , 2001, PERV.
[51] Mark S. Squillante,et al. On maximizing service-level-agreement profits , 2001, EC.
[52] Benjamin Van Roy. Neuro-Dynamic Programming: Overview and Recent Trends , 2002 .
[53] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[54] Sean P. Meyn. Sequencing and Routing in Multiclass Queueing Networks Part II: Workload Relaxations , 2003, SIAM J. Control. Optim..
[55] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[56] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[57] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[58] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[59] Giuseppe Carlo Calafiore,et al. Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..
[60] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[61] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.