论文信息 - Exploitation-Oriented Learning PS-r# - 字舞流文

Exploitation-Oriented Learning PS-r#

Shigenobu Kobayashi | Kazuteru Miyazaki | K. Miyazaki | S. Kobayashi

[1] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[2] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[3] Shigenobu Kobayashi,et al. Reinforcement Learning for Penalty Avoidance in Continuous State Spaces , 2007, J. Adv. Comput. Intell. Intell. Informatics.

[4] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[5] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[8] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[9] Theodore J. Perkins,et al. Reinforcement learning for POMDPs based on action values and stochastic optimization , 2002, AAAI/IAAI.

[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.

[12] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[13] Shigenobu Kobayashi,et al. Reinforcement learning for penalty avoiding policy making , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[14] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[15] Shigenobu Kobayashi,et al. An Extension of Profit Sharing to Partially Observable Markov Decision Processes: Proposition of PS-r* and its Evaluation , 2003 .

[16] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[17] Kathryn E. Merrick,et al. Motivated reinforcement learning for adaptive characters in open-ended simulation games , 2007, ACE '07.