Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery
暂无分享,去创建一个
[1] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[2] Nicola Secomandi,et al. Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands , 2000, Comput. Oper. Res..
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] Malcolm J. A. Strens,et al. Combining Planning with Reinforcement Learning for Multi-robot Task Allocation , 2004, Adaptive Agents and Multi-Agent Systems.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Michel Gendreau,et al. Vehicle Routing Problem with Time Windows, Part II: Metaheuristics , 2005, Transp. Sci..
[7] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Nicola Secomandi,et al. A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands , 2001, Oper. Res..
[10] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[11] Warren B. Powell,et al. Approximate dynamic programming for high dimensional resource allocation problems , 2005 .
[12] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[13] Sridhar Mahadevan,et al. Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..