Utilizing Web Scraping and Natural Language Processing to Better Inform Pedagogical Practice

This research full paper describes how web scraping and natural language processing can be utilized to answer complex questions in computer science education. In this work, we apply connectivism as the theoretical framework, and demonstrate how web scraping can be useful for extrapolating large amounts of data from publicly available web pages to pool data from a wider array of sources and to further knowledge in the field. In addition, we discuss how natural language processing can be used to reliably obtain salient information from textual data, and how it can complement qualitative analysis. To illustrate these techniques in practice, we provide a specific application in which we examine the current trends in the job market for computer science students. The information gathered in this example provides additional areas for educational consideration, such as offering students Python programming language and machine learning. Also, the job postings delineate a clear need for applicants to exhibit programming and testing skills. Although programming may be taught already, testing is widely considered a knowledge deficiency, which suggests that educators should consider placing an increased emphasis on this area to ensure their students are adequately prepared for their career endeavors, and able to transfer the knowledge taught to critically assess and debug their own programs.

[1]  Colin Fay,et al.  Text Mining with R: A Tidy Approach , 2018 .

[2]  Dorothy C. Kropf,et al.  Connectivism: 21st Century's New Learning Theory , 2013 .

[3]  Diane J. Litman,et al.  Natural Language Processing for Enhancing Teaching and Learning , 2016, AAAI.

[4]  Wendy Drexler,et al.  The networked student model for construction of personal learning environments: Balancing teacher control and student autonomy , 2010 .

[5]  Alex David Radermacher,et al.  Evaluating the gap between the skills and abilities of senior undergraduate computer science students and the expectations of industry , 2012 .

[6]  Fredrik Olsson,et al.  Investigating the Newly Graduated StudentsExperience after University , 2019 .

[7]  Sally Fincher,et al.  Computer Science Education Research , 2004 .

[8]  Fátima Suleman,et al.  The employability skills of higher education graduates: insights into conceptual frameworks and methodological options , 2018 .

[9]  Manjeet Rege,et al.  Discovering Job Market Trends with Text Analytics , 2017, 2017 International Conference on Information Technology (ICIT).

[10]  Ingrid Russell,et al.  Integrating games and machine learning in the undergraduate computer science classroom , 2008, GDCSE.

[11]  F Bell,et al.  Connectivism: a network theory for teaching and learning in a connected world , 2009 .

[12]  Khaled M. Alhawiti,et al.  Natural Language Processing and its Use in Education , 2014 .

[13]  John Sabatini,et al.  Natural Language Processing for Educational Applications , 2014 .

[14]  Erkki Sutinen,et al.  A Methodological Review of Computer Science Education Research , 2008, J. Inf. Technol. Educ..

[15]  Jeffrey R. Utecht,et al.  Becoming Relevant Again: Applying Connectivism Learning Theory to Today's Classrooms. , 2019 .

[16]  Rita Kop Web 2.0 Technologies: Disruptive or Liberating for Adult Education? , 2008 .

[17]  R. Florida,et al.  The city and high-tech startups: The spatial organization of Schumpeterian entrepreneurship , 2019, Cities.

[18]  A. W. Bates,et al.  Teaching in a Digital Age , 2015 .

[19]  Charles R. McClure,et al.  Assessing alignment between information technology educational opportunities, professional requirements, and industry demands , 2017, Education and Information Technologies.

[20]  Anastassia Loukina,et al.  Feature selection for automated speech scoring , 2015, BEA@NAACL-HLT.

[21]  Scm De S Sirisuriya A Comparative Study on Web Scraping , 2015 .

[22]  Mark Johnston,et al.  Connectivism as a Digital Age Learning Theory , 2013 .

[23]  Vahid Garousi,et al.  Aligning software engineering education with industrial needs: A meta-analysis , 2019, J. Syst. Softw..

[24]  Anne Kao,et al.  Natural Language Processing and Text Mining , 2006 .

[25]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[26]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[27]  George Siemens Connectivism: Learning Theory or Pastime of the Self-Amused? , 2006 .

[28]  Peter Jackson,et al.  Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[29]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[30]  Simon Munzert,et al.  Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining , 2014 .

[31]  George Siemens,et al.  Connectivism: Learning Theory or Pastime of the Self-Amused? , 2006 .

[32]  Viktoria Stray,et al.  Software Tester, We Want to Hire You! an Analysis of the Demand for Soft Skills , 2018, XP.

[33]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[34]  Jacob Ward Instant PHP Web Scraping , 2013 .

[35]  K. VanLehn The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems , 2011 .

[36]  Eleni Miltsakaki,et al.  Real Time Web Text Classification and Analysis of Reading Difficulty , 2008 .

[37]  Angela Carbone,et al.  Teaching ICT - The ICT-Ed Project - The report on learning outcomes and curriculm development in major university disciplines in information and communication technology , 2001 .

[38]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[39]  Alan M. Christie Software process automation - the technology and its adoption , 1995 .

[40]  Alex Radermacher,et al.  Gaps between industry expectations and the abilities of graduates , 2013, SIGCSE '13.

[41]  Karen Kukich,et al.  Evaluation of text coherence for electronic essay scoring systems , 2004, Natural Language Engineering.

[42]  Nergiz Ercil Cagiltay,et al.  Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling , 2019, IEEE Access.