A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees
暂无分享,去创建一个
[1] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[2] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[3] Moshe Ben-Horim,et al. A linear programming approach , 1977 .
[4] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[5] N. Kartashov. Inequalities in Theorems of Ergodicity and Stability for Markov Chains with Common Phase Space. I , 1986 .
[6] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[7] Michael A. Trick,et al. A Linear Programming Approach to Solving Stochastic Dynamic Programming , 1993 .
[8] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[11] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] Sean P. Meyn. The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..
[15] Stanley E. Zin,et al. SPLINE APPROXIMATIONS TO VALUE FUNCTIONS: Linear Programming Approach , 1997 .
[16] J. R. Morrison,et al. New Linear Program Performance Bounds for Queueing Networks , 1999 .
[17] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[18] Sean P. Meyn,et al. Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..
[19] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.
[20] Vivek S. Borkar,et al. Convex Analytic Methods in Markov Decision Processes , 2002 .
[21] Benjamin Van Roy,et al. Approximate Linear Programming for Average-Cost Dynamic Programming , 2002, NIPS.
[22] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[23] D. Koller,et al. Planning under uncertainty in complex structured environments , 2003 .
[24] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[25] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[26] Sean P. Meyn,et al. Performance Evaluation and Policy Selection in Multiclass Networks , 2003, Discret. Event Dyn. Syst..
[27] Milos Hauskrecht,et al. Linear Program Approximations for Factored Continuous-State Markov Decision Processes , 2003, NIPS.
[28] Milos Hauskrecht,et al. Solving Factored MDPs with Continuous and Discrete Variables , 2004, UAI.
[29] Approximate Dynamic Programming for Networks : Fluid Models and Constraint Reduction , 2004 .
[30] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[31] Daniel Adelman,et al. A Price-Directed Approach to Stochastic Inventory/Routing , 2004, Oper. Res..
[32] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[33] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[34] Sean P. Meyn. Workload models for stochastic networks: value functions and performance evaluation , 2005, IEEE Transactions on Automatic Control.
[35] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .