Educational Data Classification Framework for Community Pedagogical Content Management using Data Mining

Recent years witness the significant surge in awareness and exploitation of social media especially community Question and Answer (Q&A) websites by academicians and professionals. These sites are, large repositories of vast data, pawing ways to new avenues for research through applications of data mining and data analysis by investigation of trending topics and the topics of most attention of users. Educational Data Mining (EDM) techniques can be used to unveil potential of Community Q&A websites. Conventional Educational Data Mining approaches are concerned with generation of data through systematic ways and mined it for knowledge discovery to improve educational processes. This paper gives a novel idea to explore already generated data through millions of users having variety of expertise in their particular domains across a common platform like StackOverFlow (SO), a community Q&A website where users post questions and receive answers about particular problems. This study presents an EDM framework to classify community data into Software Engineering subjects. The framework classifies the SO posts according to the academic courses along with their best solutions to accommodate learners. Moreover, it gives teachers, instructors, educators and other EDM stakeholders an insight to pay more attention and focus on commonly occurring subject related problems and to design and manage of their courses delivery and teaching accordingly. The data mining framework performs preprocessing of data using NLP techniques and apply machine learning algorithms to classify data. Amongst all, SVM gives better performs with 72.06% accuracy. Evaluation measures like precision, recall and F-1 score also used to evaluate the best performing classifier.

[1]  Christoph Meinel,et al.  A Journey of Bounty Hunters: Analyzing the Influence of Reward Systems on StackOverflow Question Response Times , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[2]  Foutse Khomh,et al.  Software Analytics: Challenges and Opportunities , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Walid Maalej,et al.  How do open source communities blog? , 2012, Empirical Software Engineering.

[5]  Frank Leymann,et al.  Managing architectural decision models with dependency relations, integrity constraints, and production rules , 2009, J. Syst. Softw..

[6]  Beijun Shen,et al.  Mining Developer Behavior Across GitHub and StackOverflow , 2017, SEKE.

[7]  Bin Wu,et al.  Finding Experts in Community Question Answering Based on Topic-Sensitive Link Analysis , 2016, 2016 IEEE First International Conference on Data Science in Cyberspace (DSC).

[8]  Jan Bosch,et al.  Software Architecture as a Set of Architectural Design Decisions , 2005, 5th Working IEEE/IFIP Conference on Software Architecture (WICSA'05).

[9]  Pankaj Dhoolia,et al.  The Synergy between Voting and Acceptance of Answers on StackOverflow - Or the Lack Thereof , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[10]  Mohamed Soliman,et al.  Architectural Knowledge for Technology Decisions in Developer Communities: An Exploratory Study with StackOverflow , 2016, 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA).

[11]  Nicole Novielli,et al.  Mining Successful Answers in Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[12]  Arash Joorabchi,et al.  Text mining stackoverflow: An insight into challenges and subject-related difficulties faced by computer science learners , 2016, J. Enterp. Inf. Manag..

[13]  Geoffrey I. Webb,et al.  Advances in Knowledge Discovery and Data Mining , 2018, Lecture Notes in Computer Science.