Curriculum Design for Machine Learners in Sequential Decision Tasks

Existing work in machine learning has shown that algorithms can benefit from the use of curricula—learning first on simple examples before moving to more difficult problems. This work studies the curriculum-design problem in the context of sequential decision tasks, analyzing how different curricula affect learning in a Sokoban-like domain, and presenting the results of a user study that explores whether nonexperts generate effective curricula. Our results show that 1) the way in which evaluative feedback is given to the agent as it learns individual tasks does not affect the relative quality of different curricula, 2) nonexpert users can successfully design curricula that result in better overall performance than having the agent learn from scratch, and 3) nonexpert users can discover and follow salient principles when selecting tasks in a curriculum. We also demonstrate that our curriculum-learning algorithm can be improved by incorporating the principles people use when designing curricula. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula.

[1]  Greg Turk,et al.  Learning to navigate cloth using haptics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  David L. Roberts,et al.  A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[3]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[4]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[5]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[6]  Kenneth O. Stanley and Bobby D. Bryant and Risto Miikkulainen,et al.  Evolving Neural Network Agents in the NERO Video Game , 2005 .

[7]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[8]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[9]  Bilge Mutlu,et al.  How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.

[10]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[11]  David L. Roberts,et al.  Learning something from nothing: Leveraging implicit human feedback strategies , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[12]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[13]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[14]  Peter Stone,et al.  Source Task Creation for Curriculum Learning , 2016, AAMAS.

[15]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[16]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[17]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[18]  Raymond J. Mooney,et al.  Using Active Relocation to Aid Reinforcement Learning , 2006, FLAIRS.

[19]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[20]  Smaranda Muresan,et al.  Translating English to Reward Functions , 2014 .

[21]  Peter Stone,et al.  Automatic Curriculum Graph Generation for Reinforcement Learning Agents , 2017, AAAI.

[22]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[23]  Matthew E. Taylor,et al.  Language and Policy Learning from Human-delivered Feedback , 2015 .

[24]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[25]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[26]  L. Vygotsky Mind in Society: The Development of Higher Psychological Processes: Harvard University Press , 1978 .

[27]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[28]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[29]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[30]  H. Harry Asada,et al.  Progressive learning and its application to robot impedance learning , 1996, IEEE Trans. Neural Networks.

[31]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[32]  John Salvatier,et al.  Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.

[33]  G. Peterson A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.

[34]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[35]  David L. Roberts,et al.  Training an Agent to Ground Commands with Reward and Punishment , 2014, AAAI 2014.

[36]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[37]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[38]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[39]  Michiel van de Panne,et al.  Curriculum Learning for Motor Skills , 2012, Canadian Conference on AI.

[40]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[41]  P. Dayan,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.