论文信息 - Curriculum Design for Machine Learners in Sequential Decision Tasks

Curriculum Design for Machine Learners in Sequential Decision Tasks

Existing work in machine learning has shown that algorithms can benefit from the use of curricula—learning first on simple examples before moving to more difficult problems. This work studies the curriculum-design problem in the context of sequential decision tasks, analyzing how different curricula affect learning in a Sokoban-like domain, and presenting the results of a user study that explores whether nonexperts generate effective curricula. Our results show that 1) the way in which evaluative feedback is given to the agent as it learns individual tasks does not affect the relative quality of different curricula, 2) nonexpert users can successfully design curricula that result in better overall performance than having the agent learn from scratch, and 3) nonexpert users can discover and follow salient principles when selecting tasks in a curriculum. We also demonstrate that our curriculum-learning algorithm can be improved by incorporating the principles people use when designing curricula. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula.

[1] Greg Turk,et al. Learning to navigate cloth using haptics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2] David L. Roberts,et al. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[3] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.

[4] Andrew G. Barto,et al. Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[5] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[6] Kenneth O. Stanley and Bobby D. Bryant and Risto Miikkulainen,et al. Evolving Neural Network Agents in the NERO Video Game , 2005 .

[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[8] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[9] Bilge Mutlu,et al. How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.

[10] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[11] David L. Roberts,et al. Learning something from nothing: Leveraging implicit human feedback strategies , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[13] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.