论文信息 - Instructing a Reinforcement Learner

Instructing a Reinforcement Learner

In reinforcement learning (RL), rewards have been considered the most important feedback in understanding the environment. However, recently there have been interesting forays into other modes such as using sporadic supervisory inputs. This brings into the learning process richer information about the world of interest. In this paper, we model these supervisory inputs as specific types of instructions that provide information in the form of an expert’s control decision and certain structural regularities in the state space. We further provide a mathematical formulation for the same and propose a framework to incorporate them into the learning process.

Balaraman Ravindran | N. PradyotKorupoluV. | Manimaran Sivasamy Sivamurugan | Balaraman Ravindran | N. PradyotKorupoluV.

[1] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[2] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[3] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[4] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[5] Bernard P. Zeigler,et al. Toward a Formal Theory of Modeling and Simulation: Structure Preserving Morphisms , 1972, JACM.

[6] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[7] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.

[8] Balaraman Ravindran,et al. Deictic Option Schemas , 2007, IJCAI.

[9] Pradyot V. N. Korupolu,et al. Beyond Rewards : Learning from Richer Supervision , 2011 .

[10] David Chapman,et al. Vision, instruction, and action , 1990 .

[11] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[12] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .

[13] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..