Extracting Topics from Open Educational Resources

In recent years, Open Educational Resources (OERs) were earmarked as critical when mitigating the increasing need for education globally. Obviously, OERs have high-potential to satisfy learners in many different circumstances, as they are available in a wide range of contexts. However, the low-quality of OER metadata, in general, is one of the main reasons behind the lack of personalised services such as search and recommendation. As a result, the applicability of OERs remains limited. Nevertheless, OER metadata about covered topics (subjects) is essentially required by learners to build effective learning pathways towards their individual learning objectives. Therefore, in this paper, we report on a work in progress project proposing an OER topic extraction approach, applying text mining techniques, to generate high-quality OER metadata about topic distribution. This is done by: 1) collecting 123 lectures from Coursera and Khan Academy in the area of data science related skills, 2) applying Latent Dirichlet Allocation (LDA) on the collected resources in order to extract existing topics related to these skills, and 3) defining topic distributions covered by a particular OER. To evaluate our model, we used the data-set of educational resources from Youtube, and compared our topic distribution results with their manually defined target topics with the help of 3 experts in the area of data science. As a result, our model extracted topics with 79% of F1-score.

[1]  Xia Feng,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[2]  Dragomir R. Radev,et al.  TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation , 2018, ACL.

[3]  Stefan T. Mol,et al.  Labour Market Information Driven, Personalized, OER Recommendation System for Lifelong Learners , 2020, CSEDU.

[4]  Jun Wang,et al.  Topic-Specific Recommendation for Open Education Resources , 2015, ICWL.

[5]  Ralph Ewerth,et al.  A Recommender System For Open Educational Videos Based On Skill Requirements , 2020, 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT).

[6]  Soren Auer,et al.  Quality Prediction of Open Educational Resources A Metadata-based Approach , 2020, 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT).

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Júlio Cesar dos Reis,et al.  Relationships among educational materials through the extraction of implicit topics , 2019 .

[9]  Cornelio Yáñez-Márquez,et al.  Social Web Content Enhancement in a Distance Learning Environment: Intelligent Metadata Generation for Resources , 2017 .

[10]  Claudia Bauzer Medeiros,et al.  Finding out topics in educational materials using their components , 2017, 2017 IEEE Frontiers in Education Conference (FIE).

[11]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[12]  Chanle Wu,et al.  A New Intelligent Topic Extraction Model on Web , 2011, J. Comput..