Hierarchical Reinforcement Learning for Pedagogical Policy Induction (Extended Abstract)

In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. In recent years, there is growing interest in applying datadriven techniques for adaptive decision making that can dynamically tailor students’ learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. An empirical classroom study shows that the HRL policy is significantly more effective than a Deep QNetwork (DQN) induced policy and a random yet reasonable baseline policy.

[1]  Min Chi,et al.  Improving Learning & Reducing Time: A Constrained Action-Based Reinforcement Learning Approach , 2018, UMAP.

[2]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[3]  John R. Anderson,et al.  Cognitive Tutors: Lessons Learned , 1995 .

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Richard K. Staley,et al.  From Example Study to Problem Solving: Smooth Transitions Help Learning , 2002 .

[6]  B. K. Panigrahi,et al.  ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2010 .

[7]  DIMITRIOS PIERRAKOS,et al.  User Modeling and User-Adapted Interaction , 1994, User Modeling and User-Adapted Interaction.

[8]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[9]  Soumya Ray,et al.  Offline reinforcement learning with task hierarchies , 2017, Machine Learning.

[10]  Tiffany Barnes,et al.  Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts , 2019, IJCAI.

[11]  Xin Wang,et al.  Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Thomas L. Griffiths,et al.  Faster Teaching via POMDP Planning , 2016, Cogn. Sci..

[13]  René David,et al.  Discrete event dynamic systems , 1989 .

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  M. Lepper,et al.  Motivational techniques of expert human tutors: Lessons for the design of computer-based tutors. , 1993 .

[16]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[17]  Paloma Martínez,et al.  Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems , 2009, Knowl. Based Syst..

[18]  Juliane Junker One On One Tutoring By Humans And Computers , 2016 .

[19]  Joseph E. Beck,et al.  ADVISOR: A Machine Learning Architecture for Intelligent Tutor Construction , 2000, AAAI/IAAI.

[20]  Vincent Aleven,et al.  The expertise reversal effect and worked examples in tutored problem solving , 2010 .

[21]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[22]  Tiffany Barnes,et al.  Hierarchical Reinforcement Learning for Pedagogical Policy Induction , 2019, AIED.

[23]  Sergey Levine,et al.  Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.

[24]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[25]  Hamoon Azizsoltani,et al.  Adaptive sequential strategy for risk estimation of engineering systems using Gaussian process regression active learning , 2018, Eng. Appl. Artif. Intell..

[26]  Vincent Aleven,et al.  The worked-example effect: Not an artefact of lousy control conditions , 2009, Comput. Hum. Behav..

[27]  G. Miller,et al.  Cognitive science. , 1981, Science.