The linear programming approach to approximate dynamic programming: theory and application
暂无分享,去创建一个
[1] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[3] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[4] Benjamin Van Roy. Neuro-Dynamic Programming: Overview and Recent Trends , 2002 .
[5] J. Tsitsiklis,et al. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions , 2001 .
[6] Mark S. Squillante,et al. On maximizing service-level-agreement profits , 2001, PERV.
[7] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[8] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[9] Sean P. Meyn. Sequencing and Routing in Multiclass Queueing Networks Part I: Feedback Regulation , 2001, SIAM J. Control. Optim..
[10] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[11] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.
[12] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[13] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[15] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[16] J. R. Morrison,et al. New Linear Program Performance Bounds for Queueing Networks , 1999 .
[17] R. Dudley,et al. Uniform Central Limit Theorems: Notation Index , 2014 .
[18] Christine A. Shoemaker,et al. Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..
[19] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[20] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[21] Sean P. Meyn,et al. Value iteration and optimization of multiclass queueing networks , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[22] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[23] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[24] Mathukumalli Vidyasagar,et al. A Theory of Learning and Generalization , 1997 .
[25] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[26] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[27] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[28] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .
[29] R. Durrett. Probability: Theory and Examples , 1993 .
[30] V. Borkar. A convex analytic approach to Markov decision processes , 1988 .
[31] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[32] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .
[33] E. Denardo. On Linear Programming in a Markov Decision Problem , 1970 .
[34] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .
[35] D. Luenberger. Optimization by Vector Space Methods , 1968 .
[36] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .