Automatically Learning Topics and Difficulty Levels of Problems in Online Judge Systems

Online Judge (OJ) systems have been widely used in many areas, including programming, mathematical problems solving, and job interviews. Unlike other online learning systems, such as Massive Open Online Course, most OJ systems are designed for self-directed learning without the intervention of teachers. Also, in most OJ systems, problems are simply listed in volumes and there is no clear organization of them by topics or difficulty levels. As such, problems in the same volume are mixed in terms of topics or difficulty levels. By analyzing large-scale users’ learning traces, we observe that there are two major learning modes (or patterns). Users either practice problems in a sequential manner from the same volume regardless of their topics or they attempt problems about the same topic, which may spread across multiple volumes. Our observation is consistent with the findings in classic educational psychology. Based on our observation, we propose a novel two-mode Markov topic model to automatically detect the topics of online problems by jointly characterizing the two learning modes. For further predicting the difficulty level of online problems, we propose a competition-based expertise model using the learned topic information. Extensive experiments on three large OJ datasets have demonstrated the effectiveness of our approach in three different tasks, including skill topic extraction, expertise competition prediction and problem recommendation.

[1]  Kenneth R. Koedinger,et al.  Individualized Bayesian Knowledge Tracing Models , 2013, AIED.

[2]  P. Fayers Item Response Theory for Psychologists , 2004, Quality of Life Research.

[3]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[4]  Thomas L. Griffiths,et al.  Faster Teaching by POMDP Planning , 2011, AIED.

[5]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[6]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jeffrey A Douglas,et al.  Higher-order latent trait models for cognitive diagnosis , 2004 .

[8]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[9]  Zachary A. Pardos,et al.  Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing , 2010, UMAP.

[10]  James A. Kulik,et al.  Effectiveness of Intelligent Tutoring Systems , 2016 .

[11]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[12]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[13]  V. Zanden,et al.  Educational Psychology: In Theory and Practice , 1980 .

[14]  Jing Liu,et al.  A computational approach to measuring the correlation between expertise and social media influence for celebrities on microblogs , 2014, ASONAM.

[15]  David Carmel,et al.  Mining expertise and interests from social media , 2013, WWW.

[16]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[17]  Vincent Aleven,et al.  More Accurate Student Modeling through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing , 2008, Intelligent Tutoring Systems.

[18]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[19]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[20]  Tom Routen,et al.  Intelligent Tutoring Systems , 1996, Lecture Notes in Computer Science.

[21]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[22]  Pengfei Wang,et al.  Learning Hierarchical Representation Model for NextBasket Recommendation , 2015, SIGIR.

[23]  Thorsten Joachims,et al.  Modeling Intransitivity in Matchup and Comparison Data , 2016, WSDM.

[24]  Ann L. Brown,et al.  How people learn: Brain, mind, experience, and school. , 1999 .

[25]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[26]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[27]  Keith Lewin ReviewDesigning instructional systems — Decision-making in course planning and curriculum design: by Romiszowski, A.J. Kogan, Page, Nichols, 1981. pp 415 , 1982 .

[28]  Enhong Chen,et al.  Cognitive Modelling for Predicting Examinee Performance , 2015, IJCAI.

[29]  D. Rohrer The effects of spacing and mixing practice problems , 2009 .

[30]  Peng Xu,et al.  The refinement of a Q-matrix: Assessing methods to validate tasks to skills mapping , 2014, EDM.

[31]  Michel C. Desmarais,et al.  A Matrix Factorization Method for Mapping Items to Skills and for Enhancing Expert-Based Q-Matrices , 2013, AIED.

[32]  S. Chipman,et al.  Cognitively diagnostic assessment , 1995 .

[33]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[34]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[35]  Paulo Blikstein,et al.  Modeling how students learn to program , 2012, SIGCSE '12.

[36]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[37]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[38]  Salvador Roura,et al.  Jutge.org: an educational programming judge , 2012, SIGCSE '12.

[39]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[40]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[41]  Tom Minka,et al.  TrueSkill Through Time: Revisiting the History of Chess , 2007, NIPS.

[42]  Michel C. Desmarais,et al.  Mapping question items to skills with non-negative matrix factorization , 2012, SKDD.

[43]  Donatella Persico,et al.  Self-Regulated Learning , 2011 .

[44]  Dale H. Schunk,et al.  Self-Regulated Learning: From Self-Management to Self-Definition. , 2003 .

[45]  Thorsten Joachims,et al.  Predicting Matchups and Preferences in Context , 2016, KDD.

[46]  Kelli M Taylor,et al.  The effects of overlearning and distributed practise on the retention of mathematics knowledge , 2006 .

[47]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[48]  Jure Leskovec,et al.  Engaging with massive online courses , 2014, WWW.

[49]  Yi Sun,et al.  Alternating Recursive Method for Q-matrix Learning , 2014, EDM.

[50]  Md. Mustafizur Rahman,et al.  Hidden Topic Sentiment Model , 2016, WWW.

[51]  Leonidas J. Guibas,et al.  Autonomously Generating Hints by Inferring Problem Solving Policies , 2015, L@S.

[52]  Edward H. Haertel Using restricted latent class models to map the skill structure of achievement items , 1989 .

[53]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[54]  Mark S. Ackerman,et al.  Competing to Share Expertise: The Taskcn Knowledge Sharing Community , 2021, ICWSM.

[55]  Hui Xiong,et al.  Learning geographical preferences for point-of-interest recommendation , 2013, KDD.

[56]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[57]  P. V. Rao,et al.  Ties in Paired-Comparison Experiments: A Generalization of the Bradley-Terry Model , 1967 .

[58]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[59]  Young-In Song,et al.  Competition-based user expertise score estimation , 2011, SIGIR.

[60]  Mladen A. Vouk,et al.  Experimental Analysis of the Q-Matrix Method in Knowledge Discovery , 2005, ISMIS.

[61]  Richard G. Baraniuk,et al.  Time-varying learning and content analytics via sparse factor analysis , 2013, KDD.

[62]  Steven V. Shannon Using Metacognitive Strategies and Learning Styles to Create Self-Directed Learners , 2008 .

[63]  J. D. L. Torre,et al.  DINA Model and Parameter Estimation: A Didactic , 2009 .