2. Representing Information Extracted from Texts Sobek: a Text Mining Tool for Educational Applications

— This paper presents a mining tool to extract relevant terms and relationships from texts, and proposes its use in educational applications. A particular text mining technique is employed to analyze texts and build graphs from them, in which nodes represent concepts and edges represent the relationships between them. Some adjustments are proposed here in the original mining and representation methods, in order to provide results which are more suitable for our educational applications. Two experiments exemplifying the extraction of graphs from students' essays are presented in the paper. Results showed that the mining tool was able to identify a considerable number of relevant terms from the texts analyzed, providing concise representations of documents which can support students' and teachers' tasks. In recent years, data mining and text mining have become more popular in the field of Education mostly because of the growing number of systems which store large databases about students, their accesses to material available, their assignments and corresponding grades. Such expansion in the field yielded the establishment of a community committed to Educational Data Mining applications. This community is concerned mostly with the development of methods for exploring data coming from educational settings, and employing those methods to better understand students and learning processes [2]. In this work, our main goal has been to design and develop a text mining tool to be used in educational applications. Below, we list a few examples of the uses of the tool to support either students' or teachers' work: • Helping teachers to evaluate students writings from a qualitative point of view; • Assisting teachers in identifying the significance of students' contributions in discussion forums; • Supporting reading strategies; • Supporting text writing; A particular text mining technique based on statistical analysis has been used to extract graphs from texts, representing relevant terms and their relationships [1]. This technique has been customized here in order to provide results which were more suitable for the targeted applications. Typically, for long documents, the original mining algorithm extracted graphs that were too large to be comprehensible in the proposed educational applications. The changes implemented worked on the reduction of the number of nodes and relationships of the graphs, including on the extraction of compound terms. The next section presents different methods for representing data extracted from texts, including graph-based approaches. Section 3 describes the text mining method on which we have based our …

[1]  San Murugesan,et al.  Extraction of keyterms by simple text mining for business information retrieval , 2005, IEEE International Conference on e-Business Engineering (ICEBE'05).

[2]  Daniel S. Leite,et al.  Extractive Automatic Summarization: Does more Linguistic Knowledge Make a Difference? , 2007 .

[3]  Yehuda Lindell,et al.  Text Mining at the Term Level , 1998, PKDD.

[4]  Abraham Kandel,et al.  Graph-Theoretic Techniques for Web Content Mining , 2005, Series in Machine Perception and Artificial Intelligence.

[5]  Chungui Liu,et al.  On N-layer Vector Space Model-Based Web Information Retrieval , 2010, 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM).

[6]  Eliseo Reategui,et al.  Using Text-Mining to Support the Evaluation of Texts Produced Collaboratively , 2009, WCCE.

[7]  Marie-Laure Mugnier,et al.  Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs , 2008, Advanced Information and Knowledge Processing.

[8]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[9]  Craig G. Nevill-Manning,et al.  Using domain knowledge for text mining , 2006 .

[10]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[11]  Eleni E. Mangina,et al.  Utilizing vector space models for user modeling within e-learning environments , 2008, Comput. Educ..

[12]  Patricia Alejandra Behar,et al.  Qualitative Analysis of Discussion Forums , 2011 .

[13]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[14]  Dik Lun Lee,et al.  Document Ranking and the Vector-Space Model , 1997, IEEE Softw..

[15]  Katerina T. Frantzi,et al.  Automatic recognition of multi-word terms , 1998 .

[16]  Adil Alpkocak,et al.  An Extended Vector Space Model for Content Based Image Retrieval , 2009, CLEF.

[17]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986, J. Am. Soc. Inf. Sci..