Programming Knowledge Tracing: A Comprehensive Dataset and A New Model

In this paper, we study knowledge tracing in the domain of programming education and make two important contributions. First, we harvest and publish so far the most comprehensive dataset, namely BePKT, which covers various online behaviors in an OJ system, including programming text problems, knowledge annotations, user-submitted code and system-logged events. Second, we propose a new model PDKT to exploit the enriched context for accurate student behavior prediction. More specifically, we construct a bipartite graph for programming problem embedding, and design an improved pre-training model PLCodeBERT for code embedding, as well as a double-sequence RNN model with exponential decay attention for effective feature fusion. Experimental results on the new dataset BePKT show that our proposed model establishes state-of-the-art performance in programming knowledge tracing. In addition, we verify that our code embedding strategy based on PLCodeBERT is complementary to existing knowledge tracing models to further enhance their accuracy. As a side product, PLCodeBERT also results in better performance in other programming-related tasks such as code clone detection.

[1]  Li Li,et al.  CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model , 2021, ArXiv.

[2]  Jure Leskovec,et al.  Language-Agnostic Representation Learning of Source Code from Structure and Context , 2021, ICLR.

[3]  Kai-Wei Chang,et al.  Unified Pre-training for Program Understanding and Generation , 2021, NAACL.

[4]  HGKT : Introducing Hierarchical Exercise Graph for Knowledge Tracing , 2021 .

[5]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6]  George Karypis,et al.  A Self Attentive model for Knowledge Tracing , 2019, EDM.

[7]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[8]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[9]  Hailong Sun,et al.  A Novel Neural Source Code Representation Based on Abstract Syntax Tree , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Hermann Ebbinghaus (1885) Memory: A Contribution to Experimental Psychology , 2013, Annals of Neurosciences.

[12]  David Hovemeyer,et al.  Analyzing Student Work Patterns Using Programming Exercise Data , 2015, SIGCSE.

[13]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[14]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[15]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[16]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[17]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[18]  Enhong Chen,et al.  Convolutional Knowledge Tracing: Modeling Individualization in Student Learning Process , 2020, SIGIR.

[19]  Chris Piech,et al.  Deep Knowledge Tracing On Programming Exercises , 2017, L@S.

[20]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[21]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[22]  Wanxiang Che,et al.  Revisiting Pre-Trained Models for Chinese Natural Language Processing , 2020, FINDINGS.

[23]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[24]  Neil Brown,et al.  Blackbox: a large scale repository of novice programmers' activity , 2014, SIGCSE.

[25]  Aritra Ghosh,et al.  Context-Aware Attentive Knowledge Tracing , 2020, KDD.

[26]  Hisashi Kashima,et al.  Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing , 2018, AAAI.

[27]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[28]  Dit-Yan Yeung,et al.  Addressing two problems in deep knowledge tracing via prediction-consistent regularization , 2018, L@S.

[29]  Alexandra I. Cristea,et al.  Using learning analytics in the Amazonas: understanding students' behaviour in introductory programming , 2020, Br. J. Educ. Technol..

[30]  Jian Shen,et al.  Improving Knowledge Tracing via Pre-training Question Embeddings , 2020, IJCAI.

[31]  Neel Sundaresan,et al.  IntelliCode compose: code generation using transformer , 2020, ESEC/SIGSOFT FSE.

[32]  Filiz Kalelioglu,et al.  A new way of teaching programming skills to K-12 students: Code.org , 2015, Comput. Hum. Behav..

[33]  Truyen Tran,et al.  A deep language model for software code , 2016, FSE 2016.

[34]  Yong Yu,et al.  GIKT: A Graph-based Interaction Model for Knowledge Tracing , 2020, ECML/PKDD.

[35]  Jin-Hyuk Hong,et al.  Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.

[36]  Dit-Yan Yeung,et al.  Dynamic Key-Value Memory Networks for Knowledge Tracing , 2016, WWW.

[37]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[38]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[39]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[40]  R. Nigel Horspool,et al.  Code Hunt: Experience with Coding Contests at Scale , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[41]  Tiffany Barnes CS for all, equity, and responsibility , 2017, SGCS.

[42]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.