HeTROPY: Explainable learning diagnostics via heterogeneous maximum-entropy and multi-spatial knowledge representation

Abstract Autonomous learning diagnostics, where the students’ strengths and weaknesses are disclosed from their observed performance data, is a challenging task in e-learning systems. Current student knowledge models can alleviate some of the problems in learning (i.e. predicting student performance) but they neglect learning diagnostics, which is based on causal reasoning. To this end, we propose a novel heterogeneous attention interpreter with a maximum entropy regularizer on top of a student knowledge model to achieve explainable learning diagnostics. Our model segregates the impact of the homogeneous knowledge points, while promoting the heterogeneous relatives by maximizing their chance to contribute to the prediction. We also propose a multi-spatial knowledge representation that is readily generalizable to other data-driven educational tasks. Extensive experiments on real-world datasets reveal that the proposed method is able to enhance the model’s explanatory power, hence increases the trustworthiness towards learning diagnostics. It also brings notable improvement in accuracy in the student performance prediction task. The findings in this paper are adoptable to various types of e-learning systems to assist teachers to gain insights into student learning states and diagnose learning problems.

[1]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[2]  Shiliang Sun,et al.  Consensus and complementarity based maximum entropy discrimination for multi-view classification , 2016, Inf. Sci..

[3]  Dit-Yan Yeung,et al.  Addressing two problems in deep knowledge tracing via prediction-consistent regularization , 2018, L@S.

[4]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[5]  Michael Scheel,et al.  Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation , 2019, NeuroImage: Clinical.

[6]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[7]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[8]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[9]  Lidia S. Chao,et al.  Leveraging Local and Global Patterns for Self-Attention Networks , 2019, ACL.

[10]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[11]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[12]  Chengjiang Li,et al.  Prerequisite Relation Learning for Concepts in MOOCs , 2017, ACL.

[13]  Lidia S. Chao,et al.  Improving tree-based neural machine translation with dynamic lexicalized dependency encoding , 2020, Knowl. Based Syst..

[14]  Zhiyuan Liu,et al.  Representation Learning of Knowledge Graphs with Hierarchical Types , 2016, IJCAI.

[15]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[16]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[17]  Qinghua Zheng,et al.  A multi-constraint learning path recommendation algorithm based on knowledge map , 2017, Knowl. Based Syst..

[18]  Manohar Kaul,et al.  Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs , 2019, ACL.

[19]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[20]  R. Gagne,et al.  Contributions of learning to human development. , 1968, Psychological review.

[21]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[22]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.

[24]  Yongfeng Zhang,et al.  Dynamic Explainable Recommendation Based on Neural Attentive Models , 2019, AAAI.

[25]  Shiliang Sun,et al.  Alternative Multiview Maximum Entropy Discrimination , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Zhendong Niu,et al.  A learner oriented learning recommendation approach based on mixed concept mapping and immune algorithm , 2016, Knowl. Based Syst..

[27]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[28]  Ryen W. White Opportunities and challenges in search interaction , 2018, Commun. ACM.

[29]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[30]  Enhong Chen,et al.  Exercise-Enhanced Sequential Modeling for Student Performance Prediction , 2018, AAAI.

[31]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[32]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[33]  Tiffany Barnes,et al.  The Q-matrix Method: Mining Student Response Data for Knowledge , 2005 .

[34]  Penghe Chen,et al.  Prerequisite-Driven Deep Knowledge Tracing , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[35]  Lionel M. Ni,et al.  Knowledge modeling via contextualized representations for LSTM-based personalized exercise recommendation , 2020, Inf. Sci..

[36]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[37]  Tomoko Ohkuma,et al.  Augmenting Knowledge Tracing by Considering Forgetting Behavior , 2019, WWW.

[38]  Zhaopeng Tu,et al.  Convolutional Self-Attention Networks , 2019, NAACL.

[39]  Yiyu Yao,et al.  A three learning states Bayesian knowledge tracing model , 2018, Knowl. Based Syst..

[40]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[41]  Shiliang Sun,et al.  Multi-View Maximum Entropy Discrimination , 2013, IJCAI.

[42]  Jiajun Zhang,et al.  Attention With Sparsity Regularization for Neural Machine Translation and Summarization , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Xiaoli Z. Fern,et al.  Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference , 2018, EMNLP.

[44]  Lidia S. Chao,et al.  Bilingual recursive neural network based data selection for statistical machine translation , 2016, Knowl. Based Syst..

[45]  Yixin Cao,et al.  Explainable Reasoning over Knowledge Graphs for Recommendation , 2018, AAAI.

[46]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[47]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[48]  Neil T. Heffernan,et al.  Incorporating Rich Features into Deep Knowledge Tracing , 2017, L@S.

[49]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[50]  Victor B. Lawrence,et al.  An Efficient Data Mining Approach to Concept Map Generation for Adaptive Learning , 2015, ICDM.

[51]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[52]  Mike Wu,et al.  Generative Grading: Neural Approximate Parsing for Automated Student Feedback , 2019, ArXiv.

[53]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[54]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.