Practical and Ethical Challenges of Large Language Models in Education: A Systematic Literature Review

Educational technology innovations that have been developed based on large language models (LLMs) have shown the potential to automate the laborious process of generating and analysing textual content. While various innovations have been developed to automate a range of educational tasks (e.g., question generation, feedback provision, and essay grading), there are concerns regarding the practicality and ethicality of these innovations. Such concerns may hinder future research and the adoption of LLMs-based innovations in authentic educational contexts. To address this, we conducted a systematic literature review of 118 peer-reviewed papers published since 2017 to pinpoint the current state of research on using LLMs to automate and support educational tasks. The practical and ethical challenges of LLMs-based innovations were also identified by assessing their technological readiness, model performance, replicability, system transparency, privacy, equality, and beneficence. The findings were summarised into three recommendations for future studies, including updating existing innovations with state-of-the-art models (e.g., GPT-3), embracing the initiative of open-sourcing models/systems, and adopting a human-centred approach throughout the developmental process. These recommendations could support future research to develop practical and ethical innovations for supporting diverse educational tasks and benefiting students, teachers, and institutions.

[1]  F. Fischer,et al.  ChatGPT for good? On opportunities and challenges of large language models for education , 2023, Learning and Individual Differences.

[2]  M. Sallam,et al.  The Utility of ChatGPT as an Example of Large Language Models in Healthcare Education, Research and Practice: Systematic Review on the Future Perspectives and Potential Limitations , 2023, medRxiv.

[3]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[4]  J. Rudolph,et al.  ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? , 2023, 1.

[5]  Shahbaz Ahmad,et al.  Automatic computer science domain multiple-choice questions generation based on informative sentences , 2022, PeerJ Comput. Sci..

[6]  K. Porayska-Pomsta,et al.  The Ethics of Artificial Intelligence in Education , 2022 .

[7]  JingJing Wu Analysis and Evaluation of the Impact of Integrating Mental Health Education into the Teaching of University Civics Courses in the Context of Artificial Intelligence , 2022, Wireless Communications and Mobile Computing.

[8]  D. Gašević,et al.  Leveraging Class Balancing Techniques to Alleviate Algorithmic Bias for Predictive Tasks in Education , 2022, IEEE Transactions on Learning Technologies.

[9]  Arash Joorabchi,et al.  On the Application of Sentence Transformers to Automatic Short Answer Grading in Blended Assessment , 2022, 2022 33rd Irish Signals and Systems Conference (ISSC).

[10]  Juho Leinonen,et al.  Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models , 2022, ICER.

[11]  S. Purkayastha,et al.  Risks and Benefits of AI-generated Text Summarization for Expert Level Content in Graduate Health Informatics , 2022, 2022 IEEE 10th International Conference on Healthcare Informatics (ICHI).

[12]  N. Kumar,et al.  Identification and Addressal of Knowledge Gaps in Students , 2022, 2022 3rd International Conference for Emerging Technology (INCET).

[13]  R. Luckin,et al.  A Transparency Index Framework for AI in Education , 2022, AIED.

[14]  Jing Zhang,et al.  Representation and Extraction of Physics Knowledge Based on Knowledge Graph and Embedding-Combined Text Classification for Cooperative Learning , 2022, 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[15]  Roberto Martínez Maldonado,et al.  Explainable Artificial Intelligence in education , 2022, Comput. Educ. Artif. Intell..

[16]  Roberto Martínez Maldonado,et al.  Scalability, Sustainability, and Ethicality of Multimodal Learning Analytics , 2022, LAK.

[17]  A. Carroll,et al.  Teacher stress and burnout in Australia: examining the role of intrapersonal and environmental factors , 2022, Social Psychology of Education.

[18]  Florian Tramèr,et al.  What Does it Mean for a Language Model to Preserve Privacy? , 2022, FAccT.

[19]  Micha Riser,et al.  Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers , 2022, International Journal of Artificial Intelligence in Education.

[20]  G. Strang,et al.  A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level , 2021, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Suresh Kumar Sanampudi,et al.  An automated essay scoring systems: a systematic literature review , 2021, Artif. Intell. Rev..

[22]  Lanqin Zheng,et al.  Effects of a learning analytics-based real-time feedback approach on knowledge elaboration, knowledge convergence, interactive relationships and group performance in CSCL , 2021, Br. J. Educ. Technol..

[23]  C. Rothkopf,et al.  Large pre-trained language models contain human-like biases of what is right and wrong to do , 2021, Nature Machine Intelligence.

[24]  A. Hao,et al.  An Intelligent Virtual Standard Patient for Medical Students Training Based on Oral Knowledge Graph , 2023, IEEE Transactions on Multimedia.

[25]  D. Gašević,et al.  Is the Latest the Greatest? A Comparative Study of Automatic Approaches for Classifying Educational Forum Posts , 2022, IEEE Transactions on Learning Technologies.

[26]  D. Gašević,et al.  Bigger Data or Fairer Data? Augmenting BERT via Active Sampling for Educational Text Classification , 2022, COLING.

[27]  John C. Stamper,et al.  Assessing the Quality of Student-Generated Short Answer Questions Using GPT-3 , 2022, EC-TEL.

[28]  Po-Sen Huang,et al.  Ethical and social risks of harm from Language Models , 2021, ArXiv.

[29]  J. Jayaraman,et al.  Effectiveness of an Intelligent Question Answering System for Teaching Financial Literacy: A Pilot Study , 2021, Lecture Notes in Networks and Systems.

[30]  Eneko Agirre,et al.  Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey , 2021, ACM Comput. Surv..

[31]  Shay A. Geller,et al.  New Methods for Confusion Detection in Course Forums: Student, Teacher, and Machine , 2021, IEEE Transactions on Learning Technologies.

[32]  Trung Thanh Nguyen,et al.  NEU-chatbot: Chatbot for admission of National Economics University , 2021, Comput. Educ. Artif. Intell..

[33]  Hendrik Drachsler,et al.  Are We There Yet? - A Systematic Literature Review on Chatbots in Education , 2021, Frontiers in Artificial Intelligence.

[34]  Omar A. Alzubi,et al.  A novel automated essay scoring approach for reliable higher educational assessments , 2021, Journal of Computing in Higher Education.

[35]  E. Mayo-Wilson,et al.  The PRISMA 2020 statement: an updated guideline for reporting systematic reviews , 2021, BMJ.

[36]  Wanli Xing,et al.  Natural Language Generation Using Deep Learning to Support MOOC Learners , 2021, International Journal of Artificial Intelligence in Education.

[37]  Dragan Gasevic,et al.  Automatic feedback in online learning environments: A systematic literature review , 2021, Comput. Educ. Artif. Intell..

[38]  Shree Krishna Subburaj,et al.  Say What? Automatic Modeling of Collaborative Problem Solving Skills from Student Speech in the Wild , 2021, EDM.

[39]  Mykola Pechenizkiy,et al.  On the Limitations of Human-Computer Agreement in Automated Essay Scoring , 2021, EDM.

[40]  Pierpaolo Vittorini,et al.  Improved Automated Classification of Sentences in Data Science Exercises , 2021, MIS4TEL.

[41]  Z. Pardos,et al.  Automatic short answer grading with SBERT on out-of-sample questions , 2021, EDM.

[42]  Rajiv Kapoor,et al.  Feature Enhanced Capsule Networks for Robust Automatic Essay Scoring , 2021, ECML/PKDD.

[43]  Geoffrey Rockwell,et al.  Artificial Intelligence Ethics Guidelines for K-12 Education: A Review of the Global Landscape , 2021, AIED.

[44]  Dragan Gasevic,et al.  Assessing Algorithmic Fairness in Automatic Classifiers of Educational Forum Posts , 2021, AIED.

[45]  Nian-Shing Chen,et al.  Human-centered artificial intelligence in education: Seeing the invisible through the visible , 2021, Comput. Educ. Artif. Intell..

[46]  Trong-Loc Truong,et al.  Sentiment Analysis Implementing BERT-based Pre-trained Language Model for Vietnamese , 2020, 2020 7th NAFOSTED Conference on Information and Computer Science (NICS).

[47]  Beata Beigman Klebanov,et al.  An Exploratory Study of Argumentative Writing by Young Students: A transformer-based Approach , 2020, BEA.

[48]  Cindy E. Hmelo-Silver,et al.  Detecting Off-Task Behavior from Student Dialogue in Game-Based Collaborative Learning , 2020, AIED.

[49]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[50]  Yong Zhang,et al.  Automatic Construction of Subject Knowledge Graph based on Educational Big Data , 2020, ICBDE.

[51]  Dragan Gašević,et al.  The privacy paradox and its implications for learning analytics , 2020, LAK.

[52]  Alexandra I. Cristea,et al.  Automatic Subject-based Contextualisation of Programming Assignment Lists , 2020, EDM.

[53]  Neil Selwyn,et al.  What's the Problem with Learning Analytics? , 2019, J. Learn. Anal..

[54]  Victoria I. Marín,et al.  Systematic review of research on artificial intelligence applications in higher education – where are the educators? , 2019, International Journal of Educational Technology in Higher Education.

[55]  B. Parsia,et al.  A Systematic Review of Automatic Question Generation for Educational Purposes , 2019, International Journal of Artificial Intelligence in Education.

[56]  Brent Mittelstadt,et al.  Principles alone cannot guarantee ethical AI , 2019, Nature Machine Intelligence.

[57]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[58]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[59]  Dragan Gasevic,et al.  Learning analytics in higher education --- challenges and policies: a review of eight learning analytics policies , 2017, LAK.

[60]  Rebecca Ferguson,et al.  Guest Editorial: Ethics and Privacy in Learning Analytics , 2016, J. Learn. Anal..

[61]  Dragan Gasevic,et al.  Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success , 2016, Internet High. Educ..

[62]  S. Maxwell,et al.  Is psychology suffering from a replication crisis? What does "failure to replicate" really mean? , 2015, The American psychologist.

[63]  George Siemens,et al.  Ethical and privacy principles for learning analytics , 2014, Br. J. Educ. Technol..

[64]  H. Becker Findings from the Teaching, Learning, and Computing Survey: Is Larry Cuban Right? , 2000 .

[65]  Peggy A. Ertmer Addressing first- and second-order barriers to change: Strategies for technology integration , 1999 .

[66]  W. Doyle,et al.  The practicality ethic in teacher decision-making , 1977 .