A reinforcement learning approach to personalized learning recommendation systems

Personalized learning refers to instruction in which the pace of learning and the instructional approach are optimized for the needs of each learner. With the latest advances in information technology and data science, personalized learning is becoming possible for anyone with a personal computer, supported by a data-driven recommendation system that automatically schedules the learning sequence. The engine of such a recommendation system is a recommendation strategy that, based on data from other learners and the performance of the current learner, recommends suitable learning materials to optimize certain learning outcomes. A powerful engine achieves a balance between making the best possible recommendations based on the current knowledge and exploring new learning trajectories that may potentially pay off. Building such an engine is a challenging task. We formulate this problem within the Markov decision framework and propose a reinforcement learning approach to solving the problem.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  H. Robbins A Stochastic Approximation Method , 1951 .

[3]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[4]  Stochastic Approximation Methods for Latent Regression Item Response Models , 2010 .

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Shalabh Bhatnagar,et al.  Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[7]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[8]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[9]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[10]  B. Junker,et al.  Cognitive Assessment Models with Few Assumptions, and Connections with Nonparametric Item Response Theory , 2001 .

[11]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[12]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[13]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[14]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[15]  Pieter Abbeel,et al.  Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[16]  Hua-Hua Chang,et al.  From smart testing to smart learning: how testing technology can assist the new generation of education , 2016 .

[17]  K. Marti Stochastic Optimization Methods , 2005 .

[18]  M. Reckase Multidimensional Item Response Theory , 2009 .

[19]  B. Junker,et al.  Cognitive Assessment Models with Few Assumptions , and Connections with Nonparametric IRT , 2001 .

[20]  Li Cai,et al.  HIGH-DIMENSIONAL EXPLORATORY ITEM FACTOR ANALYSIS BY A METROPOLIS–HASTINGS ROBBINS–MONRO ALGORITHM , 2010 .

[21]  Walter L. Leite,et al.  Assessing Change in Latent Skills Across Time With Longitudinal Cognitive Diagnosis Modeling: An Evaluation of Model Performance , 2017, Educational and psychological measurement.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  A. Cohen,et al.  A Latent Transition Analysis Model for Assessing Change in Cognitive Skills , 2016, Educational and psychological measurement.

[24]  Junhui Wang,et al.  A Group-Specific Recommender System , 2017 .

[25]  K. VanLehn The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems , 2011 .

[26]  Yan Yang,et al.  Tracking Skill Acquisition With Cognitive Diagnosis Models: A Higher-Order, Hidden Markov Model With Covariates , 2018 .

[27]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[28]  Ding Wang,et al.  Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey , 2015, International Journal of Automation and Computing.

[29]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[30]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[31]  Yuhong Yang,et al.  RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , 2002 .

[32]  Matthias von Davier,et al.  A general diagnostic model applied to language testing data. , 2008, The British journal of mathematical and statistical psychology.

[33]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[34]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[35]  Jingchen Liu,et al.  Recommendation System for Adaptive Learning , 2018, Applied psychological measurement.

[36]  Francisco S. Melo,et al.  Q -Learning with Linear Function Approximation , 2007, COLT.

[37]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .