Recent Contributions of Data Mining to Language Learning Research

Abstract This paper will review the role of data mining in research on second language learning. Following a general introduction to the topic, three areas of data mining research will be summarized—clustering techniques, text-mining, and social network analysis—with examples from both the broader field and studies conducted by the authors. The application of data mining in second language learning research is relatively new, and more theoretical and empirical support is needed in the appropriate collection, use, and interpretation of data for specific research and pedagogical objectives. The three examples that we introduce illustrate how new data sources accessible in online environments can be analyzed to better understand the optimal instructional context for corpus-based vocabulary learning (clustering technique), characteristics and patterns of collaborative written interaction using Google Docs (text mining and visualizations), and issues of access and community in computer-mediated discussion (social network analysis). Implications of these new techniques for L2 research will be discussed.

[1]  Judith S. Olson,et al.  DocuViz: Visualizing Collaborative Writing , 2015, CHI.

[2]  J. Eccles,et al.  Adolescent pathways to adulthood drinking: sport activity involvement is not necessarily risky or protective. , 2008, Addiction.

[3]  M. Warschauer,et al.  Students Initiating Feedback , 2019, Feedback in Second Language Writing.

[4]  M. Warschauer Computer-Mediated Collaborative Learning: Theory and Practice. , 1997 .

[5]  H. M. Satar,et al.  Pre-Service EFL Teachers' Online Participation, Interaction, and Social Presence. , 2018 .

[6]  Vander Viana Research methods in applied linguistics: quantitative, qualitative and mixed methodologies , 2009 .

[7]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[8]  Jianshe Zhou,et al.  A tale of two communication tools: Discussion-forum and mobile instant-messaging apps in collaborative learning , 2018, Br. J. Educ. Technol..

[9]  Alexandra Pickett,et al.  Online learner self-regulation: Learning presence viewed through quantitative content- and social network analysis , 2013 .

[10]  Erping Zhu,et al.  Interaction and cognitive engagement: An analysis of four asynchronous online discussions , 2006 .

[11]  Kathleen Carico,et al.  A Generation in Cyberspace Engaging Readers through Online Discussions: Real Time, Online Chats Provide an Alternative Space for Engaging Reading in Making Meaning through Literature. , 2004 .

[12]  Larry Johnson,et al.  The 2011 Horizon Report. , 2011 .

[13]  Richard G. Kern Restructuring Classroom Interaction with Networked Computers: Effects on Quantity and Characteristics of Language Production , 1995 .

[14]  Antony John Kunnan,et al.  Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn , 2016 .

[15]  Ronen Feldman,et al.  The Text Mining Handbook: Information Extraction , 2006 .

[16]  S. Michaels,et al.  A pedagogy of Multiliteracies Designing Social Futures , 1996 .

[17]  M. Warschauer,et al.  Theory and practice of network-based language teaching , 2013 .

[18]  Simone Gabbriellini The Evolution of Online Forums as Communication Networks: An Agent-Based Model , 2014 .

[19]  Dana L. Grisham,et al.  Recentering the middle school classroom as a vibrant learning community: Students, literacy, and technology intersect Recentering the middle school classroom as a vibrant learning community: Students, literacy, and technology intersect , 2006 .

[20]  Barbara Means,et al.  Learning Online: What Research Tells Us About Whether, When and How , 2014 .

[21]  Kristen E. DiCerbo,et al.  Harnessing the Currents of the Digital Ocean , 2014 .

[22]  Eva Lindgren,et al.  The LS Graph: A Methodology for Visualizing Writing Revision , 2002 .

[23]  Izabela Kojic-Sabo,et al.  Approaches to vocabulary learning and their relationship to success , 1999 .

[24]  Mark Warschauer,et al.  The Effects of Corpus Use on Second Language Vocabulary Learning: A Multilevel Meta-analysis , 2019 .

[25]  D. Biber,et al.  The Cambridge handbook of English corpus linguistics , 2015 .

[26]  S. Graham,et al.  The Effects of Peer-Assisted Sentence-Combining Instruction on the Writing Performance of More and Less Skilled Young Writers. , 2005 .

[27]  Adam Smith,et al.  The Cambridge handbook of English corpus linguistics , 2016 .

[28]  Zoltán Dörnyei,et al.  Language Learners' Motivational Profiles and Their Motivated Learning Behavior. , 2005 .

[29]  K. Topping,et al.  Collaborative writing: the effects of metacognitive prompting and structured peer interaction. , 2001, The British journal of educational psychology.

[30]  Zoltán Dörnyei,et al.  Research methods in applied linguistics : quantitative,qualitative, and mixed methodologies , 2007 .

[31]  N. Garrett Technology in the Service of Language Learning: Trends and Issues , 1991 .

[32]  D. McNamara,et al.  Assessing Text Readability Using Cognitively Based Indices , 2008 .

[33]  Vladimir Batagelj,et al.  Exploratory Social Network Analysis with Pajek , 2005 .

[34]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[35]  Patricia A. Duff,et al.  A Transdisciplinary Framework for SLA in a Multilingual World , 2016 .

[36]  Mark Warschauer,et al.  Middle School Students’ Writing and Feedback in a Cloud-Based Classroom Environment , 2015, Technol. Knowl. Learn..

[37]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[38]  Lin Lu,et al.  Detecting leadership in peer-moderated online collaborative learning through text mining and social network analysis , 2018, Internet High. Educ..

[39]  Zoltán Dörnyei,et al.  Individual Differences: Interplay of Learner Characteristics and Learning Environment , 2009 .

[40]  Robert Godwin-Jones,et al.  Scaling Up and Zooming In: Big Data and Personalization in Language Learning. , 2017 .

[41]  Steven L. Thorne,et al.  Second language development theories and technology-mediated language learning , 2011 .

[42]  Arthur C. Graesser,et al.  Validating Coh-Metrix , 2006 .

[43]  Colin Lankshear,et al.  Sampling "the new" in New Literacies , 2007 .

[44]  Dorothy M. Chun Contributions of Tracking User Behavior to SLA Research , 2013, CALICO Journal.

[45]  Eva Lindgren,et al.  The Psycholinguistic Dimension in Second Language Writing: Opportunities for Research and Pedagogy Using Computer Keystroke Logging , 2008 .

[46]  Urie Bronfenbrenner,et al.  The Bioecological Model of Human Development , 2007 .

[47]  Paul Attewell,et al.  Data Mining for the Social Sciences , 2015 .

[48]  Brian Carolan Social network analysis and education , 2017 .

[49]  Yu-Jung Chang,et al.  BLOGGING TO LEARN: BECOMING EFL ACADEMIC WRITERS THROUGH COLLABORATIVE DIALOGUES , 2012 .

[50]  Elena Cotos,et al.  Validity arguments for diagnostic assessment using automated writing evaluation , 2015 .

[51]  Bryan Smith,et al.  Technology‐Enhanced SLA Research , 2017 .

[52]  Lars R. Bergman,et al.  Studying Individual Development in an Interindividual Context: A Person-Oriented Approach , 2002 .

[53]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[54]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[55]  Mark Warschauer,et al.  The effects of concordance-based electronic glosses on L2 vocabulary learning , 2017 .

[56]  Danielle S. McNamara,et al.  The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality , 2016 .

[57]  Marina Meila,et al.  An Experimental Comparison of Model-Based Clustering Methods , 2004, Machine Learning.

[58]  Mostafa Papi,et al.  Language Learner Motivational Types: A Cluster Analysis Study , 2014 .

[59]  Mark Warschauer,et al.  Advancing CALL research via data-mining techniques: Unearthing hidden groups of learners in a corpus-based L2 vocabulary learning experiment , 2018, ReCALL.

[60]  Mark Warschauer,et al.  Web-Based Collaborative Writing in L2 Contexts: Methodological Insights from Text Mining. , 2017 .

[61]  Lin Lu,et al.  Detecting Leadership in Peer-moderated Online Collaborative Learning:Text Mining and Social Network Analysis for Learning Analytics , 2016 .

[62]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[63]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[64]  Danielle S. McNamara,et al.  Computational assessment of lexical differences in L1 and L2 writing , 2009 .

[65]  J. Bruner,et al.  The role of tutoring in problem solving. , 1976, Journal of child psychology and psychiatry, and allied disciplines.

[66]  Judith S. Olson,et al.  Synchronous Collaborative Writing in the Classroom: Undergraduates' Collaboration Practices and their Impact on Writing Style, Quality, and Quantity , 2017, CSCW.

[67]  Orly Calderon,et al.  Evaluating learning outcomes of an asynchronous online discussion assignment: a post-priori content analysis , 2018, Interact. Learn. Environ..

[68]  Mark Warschauer,et al.  Participation, interaction, and academic achievement in an online discussion environment , 2015, Comput. Educ..

[69]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[70]  Muhammad Affan Ramadhana The Technology and Second Language Acquisition , 2013 .

[71]  Danielle S. McNamara,et al.  Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners , 2014 .

[72]  M. Chiu,et al.  Statistical Discourse Analysis: A method for modeling online discussion processes , 2014, J. Learn. Anal..

[73]  P. Skehan Individual Differences in Second Language Learning , 1989, Studies in Second Language Acquisition.

[74]  Vasilis Stavrou,et al.  Data Mining for Knowledge Discovery , 2015 .

[75]  Mark H. Chignell,et al.  Identifying communities in blogs: roles for social network analysis and survey instruments , 2007, Int. J. Web Based Communities.