An automatic text comprehension classifier based on mental models and latent semantic features

Reading comprehension is one of the main concerns for educational institutions, as it forges the students' ability to comprehend and learn accurately a given information source (e.g. textbooks, articles, papers, etc.). However, there are few approaches that integrates digital sources of educational information with automated systems to detect whether an individual has comprehended a given reading task. This work main contribution is a text comprehension classification methodology for the detection of reading comprehension failures in educational institutions. The proposed approach relates situational model theories and latent semantic analysis from fields of psycholinguistics and natural language processing respectively. A numerical characterization of students' documents using structural information, such as the usage of text connectors, and latent semantic features are used as input for traditional classification algorithms. Therefore, an automated classifier is built to determine whether a given student could or not comprehend the information in the given stimulus documents. For the evaluation of the proposed methodology, using a set of stimulus documents, a set of questions must be answered by an experimental group of students. We have performed experiments using first year students from Engineering and Linguistics undergraduate schools at the University of Chile with promising results.

[1]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[2]  T. A. V. Dijk,et al.  Las cosas del decir , 2012 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Helena Calsamiglia Blancafort,et al.  Las cosas del decir : manual de análisis del discurso , 1999 .

[5]  Steven A. Stahl,et al.  Children's reading comprehension and assessment , 2005 .

[6]  H. Oostendorp,et al.  The Construction of Mental Representations During Reading , 1998 .

[7]  J. Hilbe Logistic Regression Models , 2009 .

[8]  W. Kintsch Metaphor comprehension: A computational theory , 2000, Psychonomic bulletin & review.

[9]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[10]  Benoît Lemaire,et al.  A computational model for simulating text comprehension , 2006, Behavior research methods.

[11]  Terumasa Aoki,et al.  Using the KDD process to support Web site reconfigurations , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[12]  N. García Conectores discursivos en textos argumentativos breves , 2006 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  R. Mayer,et al.  Verbal redundancy in multimedia learning: When reading helps listening , 2002 .

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Thomas J. Grabowski,et al.  COMPREHENSION , 2010, Continuum.

[17]  Jennifer Wiley,et al.  Constructing arguments from multiple sources: Tasks that promote understanding and not just memory for text. , 1999 .

[18]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[19]  Zdenek Ceska,et al.  Plagiarism Detection Based on Singular Value Decomposition , 2008, GoTAL.

[20]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[21]  Vasile Rus,et al.  Automatic Detection of Student Mental Models During Prior Knowledge Activation in MetaTutor , 2009, EDM.

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[23]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[24]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[25]  Rolf A. Zwaan,et al.  Situation models in language comprehension and memory. , 1998, Psychological bulletin.

[26]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[27]  Peter W. Foltz,et al.  Reasoning from Multiple Texts: An Automatic Analysis of Readers? Situation Models , 1996 .