Joint Multi-Domain Learning for Automatic Short Answer Grading

One of the fundamental challenges towards building any intelligent tutoring system is its ability to automatically grade short student answers. A typical automatic short answer grading system (ASAG) grades student answers across multiple domains (or subjects). Grading student answers requires building a supervised machine learning model that evaluates the similarity of the student answer with the reference answer(s). We observe that unlike typical textual similarity or entailment tasks, the notion of similarity is not universal here. On one hand, para-phrasal constructs of the language can indicate similarity independent of the domain. On the other hand, two words, or phrases, that are not strict synonyms of each other, might mean the same in certain domains. Building on this observation, we propose JMD-ASAG, the first joint multidomain deep learning architecture for automatic short answer grading that performs domain adaptation by learning generic and domain-specific aspects from the limited domain-wise training data. JMD-ASAG not only learns the domain-specific characteristics but also overcomes the dependence on a large corpus by learning the generic characteristics from the task-specific data itself. On a large-scale industry dataset and a benchmarking dataset, we show that our model performs significantly better than existing techniques which either learn domain-specific models or adapt a generic similarity scoring model from a large corpus. Further, on the benchmarking dataset, we report state-of-the-art results against all existing non-neural and neural models.

[1]  Torsten Zesch,et al.  Investigating neural architectures for short answer scoring , 2017, BEA@EMNLP.

[2]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[3]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[4]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[5]  Tom Mitchell,et al.  Towards robust computerised marking of free-text responses , 2002 .

[6]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[7]  Tamara Sumner,et al.  Fast and Easy Short Answer Grading with High Accuracy , 2016, NAACL.

[8]  Rodney D. Nielsen,et al.  Recognizing entailment in intelligent tutoring systems* , 2009, Natural Language Engineering.

[9]  Stephen Pulman,et al.  Auto−marking 2: An update on the UCLES−Oxford University research into using computational linguistics to score short‚ free text responses , 2004 .

[10]  Alexander F. Gelbukh,et al.  SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis , 2013, *SEMEVAL.

[11]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[12]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[13]  Bikram Sengupta,et al.  Sentence Level or Token Level Features for Automatic Short Answer Grading?: Use Both , 2018, AIED.

[14]  Claire Cardie,et al.  Multinomial Adversarial Networks for Multi-Domain Text Classification , 2018, NAACL.

[15]  Chris Brew,et al.  SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge , 2013, *SEMEVAL.

[16]  Johanna D. Moore,et al.  A comparative evaluation of socratic versus didactic tutoring , 2001 .

[17]  Nitin Madnani,et al.  ETS: Domain Adaptation and Stacking for Short Answer Scoring , 2013, *SEMEVAL.

[18]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[19]  Peter W. Foltz,et al.  Identifying Patterns For Short Answer Scoring Using Graph-based Lexico-Semantic Text Matching , 2015, BEA@NAACL-HLT.

[20]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Walt Detmar Meurers,et al.  CoMeT: Integrating different levels of linguistic modeling for meaning assessment , 2013, *SEMEVAL.

[23]  Tanel Alumäe,et al.  Multi-Domain Recurrent Neural Network Language Model for Medical Speech Recognition , 2014, Baltic HLT.

[24]  Tanel Alumäe,et al.  Multi-domain neural network language model , 2013, INTERSPEECH.

[25]  Noah A. Smith,et al.  Deep Multitask Learning for Semantic Dependency Parsing , 2017, ACL.

[26]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[27]  Isabelle Augenstein,et al.  Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[28]  Rada Mihalcea,et al.  Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[29]  Nanyun Peng,et al.  Multi-task Multi-domain Representation Learning for Sequence Tagging , 2016, ArXiv.

[30]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[31]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.