A Q-decomposition and bounded RTDP approach to resource allocation
暂无分享,去创建一个
Brahim Chaib-draa | Pierrick Plamondon | Abder Rezak Benaskeur | B. Chaib-draa | A. Benaskeur | P. Plamondon
[1] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.
[2] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[3] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.
[4] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[5] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..
[6] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.
[7] Abdel-Illah Mouaddib,et al. An Iterative Algorithm for Solving Constrained Decentralized Markov Decision Processes , 2006, AAAI.
[8] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.
[9] Weixiong Zhang,et al. Modeling and Solving a Resource Allocation Problem with Soft Constraint Techniques , 2002 .
[10] Reid G. Simmons,et al. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.
[11] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.