Deep Reinforcement Learning for Adaptive Learning Systems

In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive learning systems as a Markov decision process (MDP). We assume latent traits to be continuous with an unknown transition model. We apply a model-free deep reinforcement learning algorithm---the deep Q-learning algorithm---that can effectively find the optimal learning policy from data on learners' learning process without knowing the actual transition model of the learners' continuous latent traits. To efficiently utilize available data, we also develop a transition model estimator that emulates the learner's learning process using neural networks. The transition model estimator can be used in the deep Q-learning algorithm so that it can more efficiently discover the optimal learning policy for a learner. Numerical simulation studies verify that the proposed algorithm is very efficient in finding a good learning policy, especially with the aid of a transition model estimator, it can find the optimal learning policy after training using a small number of learners.

[1]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Jingchen Liu,et al.  A reinforcement learning approach to personalized learning recommendation systems , 2018, The British journal of mathematical and statistical psychology.

[4]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[5]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[6]  David J. Weiss,et al.  Improving Measurement Quality and Efficiency with Adaptive Testing , 1982 .

[7]  Jie Xu,et al.  Personalized Course Sequence Recommendations , 2015, IEEE Transactions on Signal Processing.

[8]  Hua-Hua Chang,et al.  Psychometrics Behind Computerized Adaptive Testing , 2015, Psychometrika.

[9]  Richard G. Baraniuk,et al.  A Contextual Bandits Framework for Personalized Learning Action Selection , 2016, EDM.

[10]  Chun Wang,et al.  On Latent Trait Estimation in Multidimensional Compensatory Item Response Models , 2015, Psychometrika.

[11]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Hanchen Xu,et al.  Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity , 2019, IEEE Transactions on Smart Grid.

[14]  Richard G. Baraniuk,et al.  Time-varying learning and content analytics via sparse factor analysis , 2013, KDD.

[15]  Minge Xie,et al.  Investigating the Impact of Uncertainty About Item Parameters on Ability Estimation , 2011 .

[16]  T. A. Warm Weighted likelihood estimation of ability in item response theory , 1989 .

[17]  Steven Andrew Culpepper,et al.  A Hidden Markov Model for Learning Trajectories in Cognitive Diagnosis With Application to Spatial Rotation Skills , 2018, Applied psychological measurement.

[18]  G. Masters A rasch model for partial credit scoring , 1982 .

[19]  Fen-Lan Tseng,et al.  Multidimensional Adaptive Testing Using the Weighted Likelihood Estimation: A Comparison of Estimation Methods , 2001 .

[20]  Mark J. Gierl,et al.  Using Multidimensional Item Response Theory to Evaluate Educational and Psychological Tests , 2005 .

[21]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[22]  Yan Yang,et al.  Tracking Skill Acquisition With Cognitive Diagnosis Models: A Higher-Order, Hidden Markov Model With Covariates , 2018 .

[23]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[26]  Hua-Hua Chang,et al.  Combining computer adaptive testing technology with cognitively diagnostic assessment , 2008, Behavior research methods.

[27]  Jingchen Liu,et al.  Recommendation System for Adaptive Learning , 2018, Applied psychological measurement.

[28]  Jinming Zhang,et al.  A Procedure for Dimensionality Analyses of Response Data from Various Test Designs , 2013, Psychometrika.

[29]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[30]  Hanchen Xu,et al.  Optimal Hierarchical Learning Path Design With Reinforcement Learning , 2018, Applied psychological measurement.

[31]  Hua-Hua Chang,et al.  From smart testing to smart learning: how testing technology can assist the new generation of education , 2016 .

[32]  Anca D. Dragan,et al.  Accelerating Human Learning with Deep Reinforcement Learning , 2017 .

[33]  M. Reckase,et al.  Development and Application of a Multivariate Logistic Latent Trait Model , 1972 .

[34]  Guido Makransky,et al.  An Automatic Online Calibration Design in Adaptive Testing , 2010 .

[35]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[36]  Susan E. Whitely,et al.  Multicomponent latent trait models for ability tests , 1980 .

[37]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[38]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[39]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .