A data mining approach to ontology learning for automatic content-related question-answering in MOOCs.

The advent of Massive Open Online Courses (MOOCs) allows massive volume of registrants to enrol in these MOOCs. This research aims to offer MOOCs registrants with automatic content related feedback to fulfil their cognitive needs. A framework is proposed which consists of three modules which are the subject ontology learning module, the short text classification module, and the question answering module. Unlike previous research, to identify relevant concepts for ontology learning a regular expression parser approach is used. Also, the relevant concepts are extracted from unstructured documents. To build the concept hierarchy, a frequent pattern mining approach is used which is guided by a heuristic function to ensure that sibling concepts are at the same level in the hierarchy. As this process does not require specific lexical or syntactic information, it can be applied to any subject. To validate the approach, the resulting ontology is used in a question-answering system which analyses students' content-related questions and generates answers for them. Textbook end of chapter questions/answers are used to validate the question-answering system. The resulting ontology is compared vs. the use of Text2Onto for the question-answering system, and it achieved favourable results. Finally, different indexing approaches based on a subject's ontology are investigated when classifying short text in MOOCs forum discussion data; the investigated indexing approaches are: unigram-based, concept-based and hierarchical concept indexing. The experimental results show that the ontology-based feature indexing approaches outperform the unigram-based indexing approach. Experiments are done in binary classification and multiple labels classification settings . The results are consistent and show that hierarchical concept indexing outperforms both concept-based and unigram-based indexing. The BAGGING and random forests classifiers achieved the best result among the tested classifiers.

[1]  Claus Pahl,et al.  Developing Domain Ontologies for Course Content , 2007, J. Educ. Technol. Soc..

[2]  Moshe Y. Vardi Will MOOCs destroy academia? , 2012, CACM.

[3]  Vinod Kumar Yadav,et al.  Enterprise Architecture for Semantic Web Mining in Education , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[4]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[5]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[6]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[7]  Jimmy J. Lin,et al.  Omnibase: Uniform Access to Heterogeneous Data for Question Answering , 2002, NLDB.

[8]  Amal Zouaq,et al.  Building Domain Ontologies from Text for Educational Purposes , 2007, EC-TEL.

[9]  Alvaro Barreiro,et al.  Winnowing-based text clustering , 2008, CIKM '08.

[10]  Kristy Elizabeth Boyer,et al.  Unsupervised Classification of Student Dialogue Acts with Query-Likelihood Clustering , 2013, EDM.

[11]  Dimitra Tsovaltzi,et al.  Automating Hinting in Mathematical Tutorial Dialogue , 2003 .

[12]  Vasile Rus,et al.  SEMILAR: The Semantic Similarity Toolkit , 2013, ACL.

[13]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[14]  Efpraxia D. Zamani,et al.  Education and Learning in the Semantic Web , 2011, 2011 15th Panhellenic Conference on Informatics.

[15]  Saudi Arabia,et al.  Ontologies in E-Learning: Review of the Literature , 2015 .

[16]  Jinzhong Xu,et al.  Research of Automatic Question Answering System in Network Teaching , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[17]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[18]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[19]  David George Glance,et al.  The pedagogical foundations of massive open online courses , 2013, First Monday.

[20]  Gregory J. Privitera,et al.  Research Methods for the Behavioral Sciences , 2013 .

[21]  Ting Wang,et al.  Automatic Extraction of Hierarchical Relations from Text , 2006, ESWC.

[22]  James Allan,et al.  Taking Topic Detection From Evaluation to Practice , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[23]  Marek Hatala,et al.  Ontology Extraction Tools: An Empirical Study with Educators , 2012, IEEE Transactions on Learning Technologies.

[24]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[25]  Roberto García,et al.  Computer Supported Collaborative MOOCs: CSCM , 2014, IDEE '14.

[26]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[27]  Farah Benamara Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment , 2004 .

[28]  Yu-Liang Chi,et al.  Ontology-based curriculum content sequencing system with semantic rules , 2009, Expert Syst. Appl..

[29]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[30]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[31]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[33]  Jane Sinclair,et al.  Exploring the use of MOOC discussion forums , 2014 .

[34]  Kalina Bontcheva,et al.  Using Uneven Margins SVM and Perceptron for Information Extraction , 2005, CoNLL.

[35]  Alyssa Friend Wise,et al.  Identifying Content-Related Threads in MOOC Discussion Forums , 2015, L@S.

[36]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[37]  Vasile Rus,et al.  A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.

[38]  Tong Zhang,et al.  A decision-tree-based symbolic rule induction system for text categorization , 2002, IBM Syst. J..

[39]  Stephen J. H. Yang,et al.  Ontology Enabled Annotation and Knowledge Management for Collaborative Learning in Virtual Learning Community , 2004, J. Educ. Technol. Soc..

[40]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[41]  Yongfeng Huang,et al.  Short text classification based on strong feature thesaurus , 2012, Journal of Zhejiang University SCIENCE C.

[42]  Kurt Hornik Apache OpenNLP Tools Interface , 2015 .

[43]  Tanja Schultz,et al.  Correlated Bigram LSA for Unsupervised Language Model Adaptation , 2008, NIPS.

[44]  Yao Zhen,et al.  The Design of Ontology-Based Intelligent Answering System Model in Network Education , 2013 .

[45]  Vasile Rus,et al.  Measuring Semantic Similarity in Short Texts through Greedy Pairing and Word Semantics , 2012, FLAIRS Conference.

[46]  Mohamed Medhat Gaber,et al.  Automatic Content Related Feedback for MOOCs Based on Course Domain Ontology , 2014, IDEAL.

[47]  Naoaki Okazaki,et al.  Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web , 2009, ACL.

[48]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[49]  Laura Schweitzer,et al.  Database Systems A Practical Approach To Design Implementation And Management , 2016 .

[50]  Diego Mollá Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[51]  Lora Aroyo,et al.  The New Challenges for E-learning: The Educational Semantic Web , 2004, J. Educ. Technol. Soc..

[52]  Steffen Staab,et al.  Ontology Engineering Methodology , 2009, Handbook on Ontologies.

[53]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[54]  Paul Hyman In the year of disruptive education , 2012, CACM.

[55]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[56]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[57]  Darina Dicheva,et al.  TM4L: Creating and browsing educational topic maps , 2006, Br. J. Educ. Technol..

[58]  Meng Zhang,et al.  Research on Ontology Instance Learning Based on Maximum Entropy Model , 2012, 2012 Fourth International Conference on Computational and Information Sciences.

[59]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[60]  S. Sosnovsky,et al.  Ontological Web Portal for Educational Ontologies , 2005 .

[61]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[62]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[64]  César Coll,et al.  Supporting online collaborative learning in small groups: Teacher feedback on learning content, academic task and social participation , 2014, Comput. Educ..

[65]  Shourya Roy,et al.  Feature Selection for Short Text Classification using Wavelet Packet Transform , 2015, CoNLL.

[66]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.