论文信息 - Language Learning by a "Team" (Extended Abstract)

Language Learning by a "Team" (Extended Abstract)

Traditionally, research in the reinforcement learning (RL) community has been devoted to developing domain-independent algorithms such as SARSA [13], Q-learning [16], prioritized sweeping [8], or LSPI [6], that are designed to work for any given state space and action space. However, the modus operandi in RL research has been for a human expert to re-code each learning environment, including defining the actions and state features, as well as specifying the algorithm to be used. Typically each new RL experiment is run by explicitly calling a new program (even when learning can be biased by previous learning experiences, as in transfer learning [10, 15, 14]). Thus, while standards have developed for describing and testing individual RL algorithms (e.g., RL-Glue [17]), no such standards have developed for the problem of describing complete tasks to a preexisting agent. In this paper we present a new language for specifying complete tasks, and a framework for agents to learn a new policy for solving these tasks. This language was designed for the large, multi-year, multi-institution“Bootstrap Learning” (BL) project [1], which aims to enable humans to teach agents multiple different tasks using different instructional techniques or sources of training data. Our “BL Task Learning” (or BLTL) language, specific for sequential decision making tasks, allows the human teacher to specify to the agent starting states, reward functions, termination conditions, advice [7, 5] or demonstrations [9], indicate relevant previous experience (enabling transfer learning [10, 15, 14]), use previously taught tasks as primitive actions in new tasks, or specify portions of a more complex task which are to be refined by learning (enabling task decomposition[2, 3, 12]). Because parts of larger tasks can be learned each with a different technique, we enable multiple learning methods to be Cite as: A General Task Specification Language for Bootstrap Learning (Extended Abstract), Ian Fasel, Michael Quinlan, Peter Stone, Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Decker, Sichman, Sierra and Castelfranchi (eds.), May, 10–15, 2009, Budapest, Hungary, pp. XXX-XXX. Copyright c © 2009, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. synergistically integrated in a single agent that is far more capable than an agent using any one learning algorithm. 2. LANGUAGE PRIMITIVES

Arun Sharma | Sanjay Jain

[1] Carl H. Smith,et al. The Power of Pluralism for Automatic Program Synthesis , 1982, JACM.

[2] John Case,et al. Comparison of Identification Criteria for Machine Inductive Inference , 1983, Theor. Comput. Sci..

[3] Carl H. Smith,et al. Probability and Plurality for Aggregations of Learning Machines , 1988, Inf. Comput..

[4] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[5] Manuel Blum,et al. Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[6] John Case,et al. Machine Inductive Inference and Language Identification , 1982, ICALP.

[7] William I. Gasarch,et al. Learning via queries to an oracle , 1989, COLT '89.

[8] Leonard Pitt,et al. Probabilistic inductive inference , 1989, JACM.

[9] Paul Young,et al. An introduction to the general theory of algorithms , 1978 .

[10] Kenneth Wexler,et al. Formal Principles of Language Acquisition , 1980 .

[11] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.