Observing Lemmatization Effect in LSA Coherence and Comprehension Grading of Learner Summaries

Current work in learner evaluation of Intelligent Tutoring Systems (ITSs), is moving towards open-ended educational content diagnosis. One of the main difficulties of this approach is to be able to automatically understand natural language. Our work is directed to produce automatic evaluation of learner summaries in Basque. Therefore, in addition to language comprehension, difficulties emerge from Basque morphology itself. In this work, Latent Semantic Analysis (LSA) is used to model comprehension in a language in which lemmatization has shown to be highly significant. This paper tests the influence of corpus lemmatization while performing automatic comprehension and coherence grading. Summaries graded by human judges in coherence and comprehension, have been tested against LSA based measures from source lemmatized and non-lemmatized corpora. After lemmatization, the amount of LSA known single terms was reduced in a 56% of its original number. As a result, LSA grades almost match human measures, producing no significant differences between the lemmatized and non-lemmatized approaches.

[1]  Arthur C. Graesser,et al.  Combining Computational Models of Short Essay Grading for Conceptual Physics Problems , 2004, Intelligent Tutoring Systems.

[2]  M. Tomasello Constructing a Language: A Usage-Based Theory of Language Acquisition , 2003 .

[3]  Arthur C. Graesser,et al.  Select-a-Kibitzer: A Computer Tool that Gives Meaningful Feedback on Student Compositions , 2000, Interact. Learn. Environ..

[4]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[5]  Arthur C. Graesser,et al.  The Right Stuff: Do You Need to Sanitize Your Corpus When Using Latent Semantic Analysis? , 2002 .

[6]  R. Garner Efficient Text Summarization Costs and Benefits , 1982 .

[7]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[8]  Markus Jokela,et al.  Event-related potentials suggest early interaction between syntax and semantics during on-line sentence comprehension , 2005, Neuroscience Letters.

[9]  Barbara Di Eugenio,et al.  FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification , 2004, ACL.

[10]  Vimla L. Patel,et al.  The role of long-term working memory in text comprehension. , 1999 .

[11]  William J. Mathis,et al.  Costs and Benefits , 2003 .

[12]  Michael L. Littman,et al.  A statistical method for language-independent representation of the topical content of text segments , 2007 .

[13]  Peter W. Foltz,et al.  Learning from text: Matching readers and texts by latent semantic analysis , 1998 .

[14]  Tristan Miller,et al.  Essay Assessment with Latent Semantic Analysis , 2003 .

[15]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[16]  Padraic Monaghan,et al.  Proceedings of the 23rd annual conference of the cognitive science society , 2001 .

[17]  Surendra Prasad,et al.  Automatic Evaluation of Students’ Answers using Syntactically Enhanced LSA , 2003, HLT-NAACL 2003.

[18]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[19]  Eileen Kintsch,et al.  Summary Street: Interactive Computer Support for Writing , 2004 .

[20]  Peter Wiemer-Hastings,et al.  Rules for Syntax, Vectors for Semantics , 2001 .

[21]  Iraide Zipitria,et al.  From Human to Automatic Summary Evaluation , 2004, Intelligent Tutoring Systems.

[22]  Arthur C. Graesser,et al.  Teaching Tactics and Dialog in AutoTutor , 2001 .

[23]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[24]  P. Hagoort Interplay between Syntax and Semantics during Sentence Comprehension: ERP Effects of Combining Syntactic and Semantic Violations , 2003, Journal of Cognitive Neuroscience.

[25]  Itziar Aduriz,et al.  A Cascaded Syntactic Analyser for Basque , 2004, CICLing.