Education Data Science: Past, Present, Future

This AERA Open special topic concerns the large emerging research area of education data science (EDS). In a narrow sense, EDS applies statistics and computational techniques to educational phenomena and questions. In a broader sense, it is an umbrella for a fleet of new computational techniques being used to identify new forms of data, measures, descriptives, predictions, and experiments in education. Not only are old research questions being analyzed in new ways but also new questions are emerging based on novel data and discoveries from EDS techniques. This overview defines the emerging field of education data science and discusses 12 articles that illustrate an AERA-angle on EDS. Our overview relates a variety of promises EDS poses for the field of education as well as the areas where EDS scholars could successfully focus going forward.

[1]  Longbing Cao,et al.  Data Science , 2017, ACM Comput. Surv..

[2]  Daniel A. McFarland,et al.  Paradigm Wars Revisited: A Cartography of Graduate Research in the Field of Education (1980–2010) , 2020, American Educational Research Journal.

[3]  Khaled M. Alhawiti,et al.  Natural Language Processing and its Use in Education , 2014 .

[4]  B. Domingue,et al.  Essay Content is Strongly Related to Household Income and SAT Scores: Evidence from 60,000 Undergraduate Applications , 2021 .

[5]  Shayan Doroudi The Bias-Variance Tradeoff: How Data Science Can Inform Educational Debates , 2020 .

[6]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[7]  Cynthia Breazeal,et al.  A Model-Free Affective Reinforcement Learning Approach to Personalization of an Autonomous Social Robot Companion for Early Literacy Education , 2019, AAAI.

[8]  Yi Dong,et al.  Characterizing Students' Learning Behaviors Using Unsupervised Learning Methods , 2017, AIED.

[9]  Xiao Li,et al.  Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information , 2017, PLoS biology.

[10]  Helen Nissenbaum,et al.  Bias in computer systems , 1996, TOIS.

[11]  Julia Lane,et al.  Science Funding and Short-Term Economic Activity , 2014, Science.

[12]  Bruce A. Desmarais,et al.  Inferential Network Analysis , 2020 .

[13]  PISA 2018 Assessment and Analytical Framework , 2019, PISA.

[14]  Antje Kirchner,et al.  Measuring the predictability of life outcomes with a scientific mass collaboration , 2020, Proceedings of the National Academy of Sciences.

[15]  T. Snijders Stochastic actor-oriented models for network change , 1996 .

[16]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[17]  Daniel A. McFarland,et al.  Big Data and the danger of being precisely inaccurate , 2015, Big Data Soc..

[18]  Vincent Aleven,et al.  Where’s the Reward? , 2019, International Journal of Artificial Intelligence in Education.

[19]  Paolo Pin,et al.  Identifying the roles of race-based choice and chance in high school friendship network formation , 2010, Proceedings of the National Academy of Sciences.

[20]  Zachary A. Pardos,et al.  Connectionist recommendation in the wild: on the utility and scrutability of neural networks for personalized course guidance , 2018, User Modeling and User-Adapted Interaction.

[21]  Dawn Zimmaro,et al.  Reinforcement Learning for the Adaptive Scheduling of Educational Activities , 2020, CHI.

[22]  Martin White,et al.  Enterprise information portals , 2000, Electron. Libr..

[23]  Elizabeth Bagley,et al.  Epistemic network analysis : a Prototype for 21 st Century assessment of Learning , 2009 .

[24]  Anuj Kumar,et al.  Digitization and Divergence: Online School Ratings and Segregation in America , 2019 .

[25]  Jevin D. West,et al.  Helping Students FIG-ure It Out: A Large-Scale Study of Freshmen Interest Groups and Student Success , 2021 .

[26]  Elizabeth B. Dyer,et al.  Understanding Public Sentiment About Educational Reforms: The Next Generation Science Standards on Twitter , 2020, AERA Open.

[27]  R. Eynon,et al.  Parents’ Online School Reviews Reflect Several Racial and Socioeconomic Disparities in K–12 Education , 2021, AERA Open.

[28]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[29]  Mathieu d'Aquin,et al.  Unsupervised learning for understanding student achievement in a distance learning setting , 2017, 2017 IEEE Global Engineering Education Conference (EDUCON).

[30]  Aakanksha Sharaff,et al.  Data Science and Its Applications , 2021 .

[31]  V. Dhanalakshmi,et al.  Opinion mining from student feedback data using supervised learning algorithms , 2016, 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC).

[32]  Li Yuan,et al.  MOOCs and open education: Implications for higher education , 2013 .

[33]  Paloma Martínez,et al.  Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.

[34]  Doug Clow,et al.  MOOCs and the funnel of participation , 2013, LAK '13.

[35]  Making Data Science Count In and For Education , 2020 .

[36]  William S. Cleveland,et al.  Data science: An action plan for expanding the technical areas of the field of statistics , 2001, Stat. Anal. Data Min..

[37]  D. Clow MOOCs and the Funnel of Participation Doug Clow , 2016 .

[38]  Zachary A. Pardos,et al.  Data-Assistive Course-to-Course Articulation Using Machine Translation , 2019, L@S.

[39]  András Vörös,et al.  Integration in emerging social networks explains academic failure and success , 2018, Proceedings of the National Academy of Sciences.

[40]  Elizabeth A. Stuart,et al.  Education Research in a New Data Environment: Special Issue Introduction , 2019 .

[41]  Luís C. Lamb,et al.  Assessing gender bias in machine translation: a case study with Google Translate , 2018, Neural Computing and Applications.

[42]  Laura K. Allen,et al.  Critical perspectives on writing analytics , 2016, LAK.

[43]  Londa Schiebinger,et al.  Ensuring that biomedical AI benefits diverse populations , 2021, EBioMedicine.

[44]  Daniel T. Hickey,et al.  Educational data sciences: framing emergent practices for analytics of learning, organizations, and systems , 2014, LAK.

[45]  Measuring Equity-Promoting Behaviors in Digital Teaching Simulations: A Topic Modeling Approach , 2021, AERA Open.

[46]  R. Sathya,et al.  Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification , 2013 .

[47]  Thomas Davidson Black-Box Models and Sociological Explanations: Predicting High School Grade Point Average Using Neural Networks , 2019, Socius: Sociological Research for a Dynamic World.

[48]  Tomohisa Wada,et al.  Comparative study of supervised learning algorithms for student performance prediction , 2019, 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC).

[49]  Matthew Inglis,et al.  Five Decades of Mathematics Education Research , 2018, Journal for Research in Mathematics Education.

[50]  Dan Jurafsky,et al.  Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks , 2020, AERA Open.

[51]  Kathleen Mullan Harris,et al.  The Add Health Study: Design and Accomplishments , 2013 .

[52]  B. Domingue,et al.  Genetics and Education: Recent Developments in the Context of an Ugly History and an Uncertain Future , 2019, AERA Open.

[53]  Dragan Gasevic,et al.  Open Learning Analytics: an integrated modularized platform , 2011 .

[54]  Michael A. Madaio,et al.  Mobile Learning During School Disruptions in Sub-Saharan Africa , 2021, AERA Open.

[55]  Alexander Mehler,et al.  Text Readability Classification of Textbooks of a Low-Resource Language , 2012, PACLIC.

[56]  Christopher Brooks,et al.  Diverse Big Data and Randomized Field Experiments in MOOCs , 2017 .

[57]  J. C. Altomonte Future politics: Living together in a world transformed by tech , 2022, The Social Science Journal.

[58]  Penelope Hawe,et al.  Use of social network analysis to map the social relationships of staff and teachers at school. , 2007, Health education research.

[59]  Nia Dowell,et al.  It’s Not That You Said It, It’s How You Said It: Exploring the Linguistic Mechanisms Underlying Values Affirmation Interventions at Scale , 2021, AERA Open.

[60]  Ramesh Johari,et al.  Studying Undergraduate Course Consideration at Scale , 2021, AERA Open.

[61]  Xiang Ren,et al.  Will this Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora , 2020, FINDINGS.

[62]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[63]  Vivian C. Wong,et al.  A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity , 2021, AERA Open.

[64]  Benjamin L. Castleman,et al.  Bringing Transparency to Predictive Analytics: A Systematic Comparison of Predictive Modeling Methods in Higher Education , 2021, AERA Open.

[65]  Zachary A Pardos,et al.  Big data in education and the models that love them , 2017, Current Opinion in Behavioral Sciences.

[66]  Daniel A. McFarland,et al.  Network Ecology and Adolescent Social Structure , 2014, American sociological review.

[67]  John Sabatini,et al.  From Teacher Professional Development to the Classroom: How NLP Technology Can Enhance Teachers' Linguistic Awareness to Support Curriculum Development for English Language Learners , 2014 .

[68]  C. Chapelle,et al.  The promise of NLP and speech processing technologies in language assessment , 2010 .

[69]  Ha Nguyen,et al.  In or Out of Sync: Federal Funding and Research in Early Childhood , 2020 .

[70]  J. J. Williams,et al.  Mining Big Data in Education: Affordances and Challenges , 2020, Review of Research in Education.

[71]  Diana Baader,et al.  Concise Survey Of Computer Methods , 2016 .

[72]  Daniel A. McFarland,et al.  Sociology in the Era of Big Data: The Ascent of Forensic Social Science , 2015, The American Sociologist.

[73]  Du Q. Huynh,et al.  A supervised learning framework: using assessment to identify students at risk of dropping out of a MOOC , 2020, J. Comput. High. Educ..

[74]  A. Manjunath,et al.  Comprehensive analysis of 2.4 million patent-to-research citations maps the biomedical innovation and translation landscape , 2021, Nature Biotechnology.

[75]  Brigid Barron Interest and Self-Sustained Learning as Catalysts of Development: A Learning Ecology Perspective , 2006, Human Development.

[76]  Zachary A. Pardos,et al.  Towards Equity and Algorithmic Fairness in Student Grade Prediction , 2021, AIES.

[77]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[78]  D. Donoho 50 Years of Data Science , 2017 .

[79]  M. E. Lucas,et al.  What's in a Grade? School Report Cards and the Housing Market , 2004 .

[80]  J. Moody Race, School Integration, and Friendship Segregation in America1 , 2001, American Journal of Sociology.