Multi-Armed Bandits for Intelligent Tutoring Systems

We present an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activities to maximize skills acquired by students, taking into account the limited time and motivational resources. At a given point in time, the system proposes to the students the activity which makes them progress faster. We introduce two algorithms that rely on the empirical estimation of the learning progress, RiARiT that uses information about the difficulty of each exercise and ZPDES that uses much less knowledge about the problem. The system is based on the combination of three approaches. First, it leverages recent models of intrinsically motivated learning by transposing them to active teaching, relying on empirical estimation of learning progress provided by specific activities to particular students. Second, it uses state-of-the-art Multi-Arm Bandit (MAB) techniques to efficiently manage the exploration/exploitation challenge of this optimization process. Third, it leverages expert knowledge to constrain and bootstrap initial exploration of the MAB, while requiring only coarse guidance information of the expert and allowing the system to deal with didactic gaps in its knowledge. The system is evaluated in a scenario where 7-8 year old schoolchildren learn how to decompose numbers while manipulating money. Systematic experiments are presented with simulated students, followed by results of a user study across a population of 400 school children.

[1]  Leslie J. Briggs,et al.  Principles of Instructional Design , 1974 .

[2]  M. Csíkszentmihályi,et al.  Optimal experience: Psychological studies of flow in consciousness. , 1988 .

[3]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[4]  M. Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[5]  G. Arsac La transposition didactique à l'épreuve , 1994 .

[6]  John R. Anderson,et al.  Cognitive Tutors: Lessons Learned , 1995 .

[7]  Cristina Conati,et al.  Procedural Help in Andes: Generating Hints Using a Bayesian Network Student Model , 1998, AAAI/IAAI.

[8]  Carol D. Lee Signifying in the Zone of Proximal Development , 2000 .

[9]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[10]  LuckinRosemary Designing children's software to ensure productive interactivity through collaboration in the zone of proximal development (ZPD) , 2001 .

[11]  E. Lutton,et al.  Artificial Ant Colonies and E-Learning : An Optimisation of Pedagogical Paths , 2002 .

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[14]  Kenneth R. Koedinger,et al.  Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement , 2006, Intelligent Tutoring Systems.

[15]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[16]  Pierre-Yves Oudeyer,et al.  Discovering communication , 2006, Connect. Sci..

[17]  J. Beck Difficulties in inferring student knowledge from observations ( and why you should care ) , 2007 .

[18]  Kenneth R. Koedinger,et al.  Is Over Practice Necessary? - Improving Learning Efficiency with the Cognitive Tutor through Educational Data Mining , 2007, AIED.

[19]  Joseph E. Beck,et al.  Identifiability: A Fundamental Problem of Student Modeling , 2007, User Modeling.

[20]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[22]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[23]  S. Engeser,et al.  Flow, performance and moderators of challenge-skill balance , 2008 .

[24]  Russell G. Almond,et al.  You Can't Fatten A Hog by Weighing It - Or Can You? Evaluating an Assessment for Learning System Called ACED , 2008, Int. J. Artif. Intell. Educ..

[25]  Vincent Aleven,et al.  More Accurate Student Modeling through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing , 2008, Intelligent Tutoring Systems.

[26]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[27]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[28]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[29]  Thomas Zeugmann,et al.  Recent Developments in Algorithmic Teaching , 2009, LATA.

[30]  Arthur C. Graesser,et al.  Toward Spoken Human–Computer Tutorial Dialogues , 2010, Hum. Comput. Interact..

[31]  Stuart J. Russell,et al.  RAPID: A Reachable Anytime Planner for Imprecisely-sensed Domains , 2010, UAI.

[32]  Jacqueline Bourdeau,et al.  Advances in Intelligent Tutoring Systems , 2010 .

[33]  Tiffany Barnes,et al.  Using Markov Decision Processes for Automatic Hint , 2010 .

[34]  Kurt VanLehn,et al.  Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies , 2011, User Modeling and User-Adapted Interaction.

[35]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[36]  Thomas L. Griffiths,et al.  Faster Teaching by POMDP Planning , 2011, AIED.

[37]  Kenneth R. Koedinger,et al.  Using Contextual Factors Analysis to Explain Transfer of Least Common Multiple Skills , 2011, AIED.

[38]  V. Shute SteAlth ASSeSSment in computer-BASed GAmeS to Support leArninG , 2011 .

[39]  Kurt VanLehn,et al.  Instructional Factors Analysis: A Cognitive Model For Multiple Instructional Interventions , 2011, EDM.

[40]  Michel C. Desmarais Performance comparison of item-to-item skills models with the IRT single latent trait model , 2011, UMAP'11.

[41]  M. P. Jacob Habgood,et al.  Motivating Children to Learn Effectively: Exploring the Value of Intrinsic Integration in Educational Games , 2011 .

[42]  Matthai Philipose,et al.  Towards a Physical and Personal Math Coin Tutoring System , 2011, AIED.

[43]  Jack Mostow,et al.  Dynamic Cognitive Tracing: Towards Unified Discovery of Student and Cognitive Models , 2012, EDM.

[44]  Didier Roy Usage d'un robot pour la rem ediation en math ematiques , 2012 .

[45]  Thomas L. Griffiths,et al.  Inferring learners' knowledge from observed actions , 2012, EDM.

[46]  Manuel Lopes,et al.  Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.

[47]  Emma Brunskill,et al.  The Impact on Individualizing Student Models on Necessary Practice Opportunities , 2012, EDM.

[48]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[49]  Pierre-Yves Oudeyer,et al.  The strategic student approach for life-long exploration and learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[50]  Anna N. Rafferty,et al.  ChemVLab+: Evaluating a Virtual Lab Tutor for High School Chemistry , 2012, ICLS.

[51]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[52]  Ryan Shaun Joazeiro de Baker,et al.  New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization , 2013, AI Mag..

[53]  Neil T. Heffernan,et al.  Extending Knowledge Tracing to Allow Partial Credit: Using Continuous versus Binary Nodes , 2013, AIED.

[54]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[55]  Joseph E. Beck,et al.  Limits to accuracy: how well can we do at student modeling? , 2013, EDM.

[56]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[57]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[58]  Arvid Kappas,et al.  Towards Empathic Virtual and Robotic Tutors , 2013, AIED.

[59]  Alessandro Lazaric,et al.  Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[60]  Pierre-Yves Oudeyer,et al.  Online Optimization of Teaching Sequences with Multi-Armed Bandits , 2014, EDM.

[61]  Peter Brusilovsky,et al.  General Features in Knowledge Tracing: Applications to Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge , 2014 .

[62]  D. Berlyne Conflict, arousal, and curiosity , 2014 .

[63]  Zachary A. Pardos,et al.  A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing , 2014, EDM.

[64]  Vincent Aleven,et al.  Intelligent Tutoring Goes To School in the Big City , 1997 .