Importance Sampling to Identify Empirically Valid Policies and their Critical Decisions

In this work, we investigated off-policy policy evaluation (OPE) metrics to evaluate Reinforcement Learning (RL) induced policies and to identify critical decisions in the context of Intelligent Tutoring Systems (ITSs). We explore the use of three common Importance Sampling based OPE metrics in two deployment settings to evaluate four RL-induced policies for a logic ITS. The two deployment settings explore the impact of using original or normalized rewards, and the impact of transforming deterministic to stochastic policies. Our results show that Per Decision Importance Sampling (PDIS), using soft max and original rewards, is the best metric, and the only metric that reached 100% alignment between the theoretical and empirical classroom evaluation results. Furthermore, we used PDIS to identify what we call critical decisions in RL-induced policies, where the policies successfully identify large differences between decisions. We found that the students who received more critical decisions significantly outperformed those who received less; more importantly, this result only holds on the policy that was identified to be effective using PDIS, not on ineffective ones.

[1]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[2]  Michael Eagle,et al.  Experimental Evaluation of Automatic Hint Generation for a Logic Tutor , 2011, Int. J. Artif. Intell. Educ..

[3]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[4]  Min Chi,et al.  Reinforcement Learning: the Sooner the Better, or the Later the Better? , 2016, UMAP.

[5]  Thomas L. Griffiths,et al.  Faster Teaching via POMDP Planning , 2016, Cogn. Sci..

[6]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[7]  Jonathan P. Rowe,et al.  Optimizing Player Experience in Interactive Narrative Planning: A Modular Reinforcement Learning Approach , 2014, AIIDE.

[8]  Paloma Martínez,et al.  Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.

[9]  Kurt VanLehn,et al.  Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies , 2011, User Modeling and User-Adapted Interaction.

[10]  Joel R. Tetreault,et al.  Estimating the Reliability of MDP Policies: a Confidence Interval Approach , 2007, HLT-NAACL.

[11]  Vincent Aleven,et al.  Intelligent Tutoring Goes To School in the Big City , 1997 .

[12]  Pierre-Yves Oudeyer,et al.  A Comparison of Automatic Teaching Strategies for Heterogeneous Student Populations , 2016, EDM.

[13]  Jonathan P. Rowe,et al.  Interactive Narrative Personalization with Deep Reinforcement Learning , 2017, IJCAI.

[14]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[15]  Baining Guo,et al.  Spoken dialogue management as planning and acting under uncertainty , 2001, INTERSPEECH.

[16]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[17]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[18]  Ryan Shaun Joazeiro de Baker,et al.  New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization , 2013, AI Mag..

[19]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[20]  Daniel L. Schwartz,et al.  Rethinking transfer: A simple proposal with multiple implica-tions , 1999 .

[21]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[22]  Tiffany Barnes,et al.  Data-Driven Proficiency Profiling , 2015, EDM.

[23]  Kurt VanLehn,et al.  The Behavior of Tutoring Systems , 2006, Int. J. Artif. Intell. Educ..

[24]  Min Chi,et al.  Aim Low: Correlation-based Feature Selection for Model-based Reinforcement Learning , 2016, EDM.

[25]  Paloma Martínez,et al.  Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems , 2009, Knowl. Based Syst..

[26]  Sergey Levine,et al.  Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.

[27]  J. Hammersley,et al.  General Principles of the Monte Carlo Method , 1964 .

[28]  Philip S. Thomas,et al.  Safe Reinforcement Learning , 2015 .

[29]  Jonathan P. Rowe,et al.  Improving Student Problem Solving in Narrative-Centered Learning Environments: a Modular Reinforcement Learning Framework , 2015, AIED.

[30]  Joseph E. Beck,et al.  ADVISOR: A Machine Learning Architecture for Intelligent Tutor Construction , 2000, AAAI/IAAI.

[31]  Collin Lynch,et al.  Towards Closing the Loop: Bridging Machine-induced Pedagogical Policies to Learning Theories , 2017, EDM.

[32]  Vincent Aleven,et al.  Towards Understanding How to Leverage Sense-making, Induction/Refinement and Fluency to Improve Robust Learning , 2015, EDM.