The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

Automated computerized scoring systems (ACSSs) are being increasingly used to analyze text in many educational settings. Nevertheless, the impact of misspelled words (MSW) on scoring accuracy remains to be investigated in many domains, particularly jargon-rich disciplines such as the life sciences. Empirical studies confirm that MSW are a pervasive feature of human-generated text and that despite improvements, spell-check and auto-replace programs continue to be characterized by significant errors. Our study explored four research questions relating to MSW and text-based computer assessments: (1) Do English language learners (ELLs) produce equivalent magnitudes and types of spelling errors as non-ELLs? (2) To what degree do MSW impact concept-specific computer scoring rules? (3) What impact do MSW have on computer scoring accuracy? and (4) Are MSW more likely to impact false-positive or false-negative feedback to students? We found that although ELLs produced twice as many MSW as non-ELLs, MSW were relatively uncommon in our corpora. The MSW in the corpora were found to be important features of the computer scoring models. Although MSW did not significantly or meaningfully impact computer scoring efficacy across nine different computer scoring models, MSW had a greater impact on the scoring algorithms for naïve ideas than key concepts. Linguistic and concept redundancy in student responses explains the weak connection between MSW and scoring accuracy. Lastly, we found that MSW tend to have a greater impact on false-positive feedback. We discuss the implications of these findings for the development of next-generation science assessments.

[1]  Ross H. Nehm,et al.  Does Increasing Biology Teacher Knowledge of Evolution and the Nature of Science Lead to Greater Preference for the Teaching of Evolution in Schools? , 2007 .

[2]  Madeline Haggan,et al.  Spelling errors in native Arabic-speaking English majors: A comparison between remedial students and fourth year students , 1991 .

[3]  Ngss Lead States Next generation science standards : for states, by states , 2013 .

[4]  Mark Urban-Lurain,et al.  Applying Computerized-Scoring Models of Written Biological Explanations across Courses and Colleges: Prospects and Limitations , 2011, CBE life sciences education.

[5]  J. R. Thomas,et al.  Does self-efficacy predict performance in experienced weightlifters? , 1991, Research quarterly for exercise and sport.

[6]  Shourya Roy,et al.  How Much Noise Is Too Much: A Study in Automatic Text Classification , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Louise T. Su The Relevance of Recall and Precision in User Evaluation , 1994, J. Am. Soc. Inf. Sci..

[8]  Jennifer Kadlowec,et al.  Classes That Click: Fast, Rich Feedback to Enhance Student Learning and Satisfaction , 2010 .

[9]  Y. Abu-Mostafa Machines that Think for Themselves , 2012 .

[10]  Minsu Ha,et al.  EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations , 2014, Evolution: Education and Outreach.

[11]  Jennifer Knight,et al.  Harnessing Technology to Improve Formative Assessment of Student Conceptions in STEM: Forging a National Network , 2011, CBE life sciences education.

[12]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[13]  Andrea A. Lunsford,et al.  Frequency of Formal Errors in Current College Writing , 1988 .

[14]  Mary Lee S. Ledbetter,et al.  Vision and Change in Undergraduate Biology Education: A Call to Action Presentation to Faculty for Undergraduate Neuroscience, July 2011 , 2012, Journal of undergraduate neuroscience education : JUNE : a publication of FUN, Faculty for Undergraduate Neuroscience.

[15]  Shameem Nyla NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION , 2004 .

[16]  Elijah Mayfield,et al.  Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations , 2012 .

[17]  Stephen J. Finch,et al.  Clicker Score Trajectories and Concept Inventory Scores as Predictors for Early Warning Systems for Large STEM Classes , 2015 .

[18]  J. Opfer,et al.  Cognitive foundations for science assessment design: Knowing what students know about evolution , 2012 .

[19]  Kieron Sheehy,et al.  Children's engagement with educational iPad apps: Insights from a Spanish classroom , 2014, Comput. Educ..

[20]  Anna N. Rafferty,et al.  Computer-Guided Inquiry to Improve Science Learning , 2014, Science.

[21]  Edward W. D. Whittaker,et al.  Creating a manually error-tagged and shallow-parsed learner corpus , 2011, ACL.

[22]  C. Hofstetter,et al.  Assessment Accommodations for English Language Learners: Implications for Policy-Based Empirical Research , 2004 .

[23]  Brent Bridgeman,et al.  Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country , 2012 .

[24]  Andrea A. Lunsford,et al.  "Mistakes Are a Fact of Life": A National Comparative Study , 2008 .

[25]  G. Church,et al.  Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. , 2002, Journal of molecular biology.

[26]  Yoshihiro Yamanishi,et al.  The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships , 2005, Bioinform..

[27]  R. Nehm,et al.  Biology Majors' Knowledge and Misconceptions of Natural Selection , 2007 .

[28]  D. Pearl,et al.  Using a Constructed-Response Instrument to Explore the Effects of Item Position and Item Features on the Assessment of Students’ Written Scientific Explanations , 2015 .

[29]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[30]  Katherine A. Karl,et al.  The impact of feedback and self-efficacy on performance in training. , 1993 .

[31]  Kathleen Flynn,et al.  English Language Learners: A Growing Population , 2005 .

[32]  David A. Gillam,et al.  A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas , 2012 .

[33]  Charles W. Anderson,et al.  Student conceptions of natural selection and its role in evolution , 1986 .

[34]  Mark Urban-Lurain,et al.  What Are They Thinking? Automated Analysis of Student Writing about Acid–Base Chemistry in Introductory Biology , 2012, CBE life sciences education.

[35]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[36]  Michael Flor,et al.  On using context for automatic correction of non-word misspellings in student essays , 2012, BEA@NAACL-HLT.

[37]  J. Abedi The No Child Left Behind Act and English Language Learners: Assessment and Accountability Issues , 2004 .

[38]  Minsu Ha,et al.  Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance? , 2013, Journal of Science Education and Technology.

[39]  Brendan T. O'Connor,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics , 2011 .

[40]  Helena Seli,et al.  "Clickers" and metacognition: A quasi-experimental comparative study about metacognitive self-regulation and use of electronic feedback devices , 2013, Comput. Educ..

[41]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[42]  Fabrice Muhlenbach,et al.  Identifying and Handling Mislabelled Instances , 2004, Journal of Intelligent Information Systems.

[43]  K. Hursey,et al.  Change mechanisms in EMG biofeedback training cognitive changes underlying improvements in tension headache , 1984 .

[44]  Helen R. Quinn,et al.  A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas , 2013 .

[45]  Isaac I. Bejar,et al.  A methodology for scoring open-ended architectural design problems. , 1991 .

[46]  D. Eignor The standards for educational and psychological testing. , 2013 .

[47]  Linda Bebout,et al.  An error analysis of misspellings made by learners of English as a first and as a second language , 1985 .