论文信息 - Learning and Reusing Goal-Specific Policies for Goal-Driven Autonomy

Learning and Reusing Goal-Specific Policies for Goal-Driven Autonomy

In certain adversarial environments, reinforcement learning (RL) techniques require a prohibitively large number of episodes to learn a high-performing strategy for action selection. For example, Q-learning is particularly slow to learn a policy to win complex strategy games. We propose GRL, the first GDA system capable of learning and reusing goal-specific policies. GRL is a case-based goal-driven autonomy (GDA) agent embedded in the RL cycle. GRL acquires and reuses cases that capture episodic knowledge about an agent’s (1) expectations, (2) goals to pursue when these expectations are not met, and (3) actions for achieving these goals in given states. Our hypothesis is that, unlike RL, GRL can rapidly fine-tune strategies by exploiting the episodic knowledge captured in its cases. We report performance gains versus a state-of-the-art GDA agent and an RL agent for challenging tasks in two real-time video game domains.

[1] Martin A. Riedmiller,et al. An Analysis of Case-Based Value Function Approximation by Approximating State Transition Graphs , 2007, ICCBR.

[2] Luc Lamontagne,et al. Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[3] James F. Allen,et al. Mixed-Initiative Systems for Collaborative Problem Solving , 2007, AI Mag..

[4] Hector Muñoz-Avila,et al. Reducing the Memory Footprint of Temporal Difference Learning over Finitely Many States by Using Case-Based Generalization , 2010, ICCBR.

[5] Barry Smyth,et al. Retrieval, reuse, revision and retention in case-based reasoning , 2005, The Knowledge Engineering Review.

[6] Barry Smyth,et al. Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[7] Janet L. Kolodner,et al. Case-Based Reasoning , 1989, IJCAI 1989.

[8] Derek G. Bridge. The Virtue of Reward: Performance, Reinforcement and Discovery in Case-Based Reasoning , 2005, ICCBR.

[9] David W. Aha,et al. Integrated Learning for Goal-Driven Autonomy , 2011, IJCAI.

[10] Martin A. Riedmiller,et al. CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.

[11] David W. Aha,et al. Goal-Driven Autonomy in a Navy Strategy Simulation , 2010, AAAI.

[12] Hector Muñoz-Avila,et al. Recognizing the Enemy: Combining Reinforcement Learning with Strategy Selection Using Case-Based Reasoning , 2008, ECCBR.

[13] Dana S. Nau,et al. Current Trends in Automated Planning , 2007, AI Mag..

[14] David W. Aha,et al. Goal-Driven Autonomy with Case-Based Reasoning , 2010, ICCBR.

[15] Paolo Traverso,et al. Automated planning - theory and practice , 2004 .

[16] Michael T. Cox. Perpetual Self-Aware Cognitive Agents , 2007, AI Mag..

[17] Arnav Jhala,et al. Case-Based Goal Formulation , 2010 .

[18] Hector Muñoz-Avila,et al. RETALIATE: Learning Winning Policies in First-Person Shooter Games , 2007, AAAI.

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Alessandro Saffiotti,et al. Monitoring the execution of robot plans using semantic knowledge , 2008, Robotics Auton. Syst..

[21] David W. Aha,et al. Active and Interactive Discovery of Goal Selection Knowledge , 2011, FLAIRS.

[22] D. Aha,et al. Case-Based Learning in Goal-Driven Autonomy Agents for Real-Time Strategy Combat Tasks , 2011 .

[23] Reinaldo A. C. Bianchi,et al. Improving Reinforcement Learning by Using Case Based Heuristics , 2009, ICCBR.

[24] Arnav Jhala,et al. Applying Goal-Driven Autonomy to StarCraft , 2010, AIIDE.

[25] Ashwin Ram,et al. Continuous Case-Based Reasoning , 1997, Artif. Intell..