Exploring the differences between low-stakes proctored and unproctored language testing using an Internet-based application

Abstract In this work, we explore the differences between proctored and unproctored Internet administration for a Basque language low-stakes test considering demographic factors such as age, gender, and knowledge level in the subject. To this aim, we have developed an ad hoc application that allows us to establish a set of filters and techniques that successfully control dropout and non-serious test takers, two of the main threats of low-stakes testing. A total of 2,095 sessions has been registered. The obtained results show that age and knowledge level influences the execution of the test, whereas gender does not. Moreover, it is made explicit that conducting the test in an unproctored manner redounds to the achievement of better results. Finally, even if the time needed to complete the test is comparable in both cases, it is better invested in the unproctored version, i.e. less time is devoted to easy questions and more time is devoted to difficult questions if compared to the proctored version. These results seem to indicate that the unproctored version measures better the knowledge level for language low-stakes tests because they are carried out in an environment that is familiar to the examinee, and they lack the pressure of proctored testing.

[1]  Alexander Seeshing Yeung,et al.  Use of computer technology for English language learning: do learning styles, gender, and age matter? , 2016 .

[2]  Chris Greaves,et al.  COMPUTER ASSISTED LANGUAGE LEARNING (CALL) , 1989 .

[3]  Fritz Drasgow,et al.  Personality assessment: Does the medium matter? No☆ , 2006 .

[4]  Xiao Hu,et al.  Understanding the nature of learners’ out-of-class language learning experience with technology , 2018 .

[5]  Fritz Drasgow,et al.  Proctored Versus Unproctored Internet Tests: Are Unproctored Noncognitive Tests as Predictive of Job Performance? , 2011 .

[6]  Paul Rosenfeld,et al.  Impression management, social desirability, and computer administration of attitude questionnaires: Does the computer make a difference? , 1992 .

[7]  Eckart Altenmüller,et al.  Continuous Measurement of Musically-Induced Emotion: A Web Experiment , 2009 .

[8]  Rolph E. Anderson,et al.  Multivariate Data Analysis (7th ed. , 2009 .

[9]  Claire Hewson,et al.  Can online course-based assessment methods be fair and equitable? Relationships between students' preferences and performance within online and offline assessments , 2012, J. Comput. Assist. Learn..

[10]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[11]  Birk Diedenhofen,et al.  Seriousness checks are useful to improve data validity in online research , 2012, Behavior Research Methods.

[12]  Jesús García Laborda Introducing Standardized EFL/ESL Exams. , 2007 .

[13]  César Domínguez,et al.  Spiral and Project-Based Learning with Peer Assessment in a Computer Science Project Management Course , 2016 .

[14]  James Dean Brown,et al.  Computers in language testing: Present research and some future directions , 1997 .

[15]  Franziska Lemke,et al.  Comparison of ability tests administered online and in the laboratory , 2009, Behavior research methods.

[16]  Carsten Roever,et al.  WEB-BASED LANGUAGE TESTING , 2001 .

[17]  J. B. Wyman,et al.  What is reading ability , 1921 .

[18]  A. Scholey,et al.  A short self-report measure of problems with executive function suitable for administration via the Internet , 2010, Behavior research methods.

[19]  Julián Gutiérrez Serrano,et al.  Calibration of an item bank for the assessment of Basque language knowledge , 2010, Comput. Educ..

[20]  Ulf-Dietrich Reips,et al.  Internet experiments: methods, guidelines, metadata , 2009, Electronic Imaging.

[21]  Fritz Drasgow,et al.  UNPROCTORED INTERNET TESTING IN EMPLOYMENT SETTINGS , 2006 .

[22]  Jaime Riera,et al.  Online exams for blended assessment. Study of different application methodologies , 2015, Comput. Educ..

[23]  Fritz Drasgow,et al.  A Meta-Analytic Study of Social Desirability Distortion in Computer- Administered Questionnaires, Traditional Questionnaires, and Interviews , 1999 .

[24]  Jaeyool Boo,et al.  Comparability of a paper-based language test and a computer-based language test , 2003 .

[25]  Ana Meštrović,et al.  Adaptivity in educational systems for language learning: a review , 2017 .

[26]  Anna Brown,et al.  Online Testing: Mode of Administration and the Stability of Opq 32i Scores , 2004 .

[27]  Chun Lai,et al.  University student and teacher perceptions of teacher roles in promoting autonomous language learning with technology outside the classroom , 2016 .

[28]  Silvia Zuffia,et al.  Web-based vs . controlled environment psychophysics experiments , 2006 .

[29]  John L. Smith,et al.  Using the Internet for psychological research: personality testing on the World Wide Web. , 1999, British journal of psychology.

[30]  R. Landers,et al.  Offsetting Performance Losses Due to Cheating in Unproctored Internet‐Based Testing by Increasing the Applicant Pool , 2012 .

[31]  Filip Lievens,et al.  Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large‐scale operational test program , 2011 .

[32]  M. D’Esposito Working memory. , 2008, Handbook of clinical neurology.

[33]  Tara S. Behrend,et al.  Cheating, Reactions, and Performance in Remotely Proctored Testing: An Exploratory Experimental Study , 2014, Journal of Business and Psychology.

[34]  Winfred Arthur,et al.  The Magnitude and Extent of Cheating and Response Distortion Effects on Unproctored Internet-Based Tests of Cognitive Ability and Personality , 2010 .

[35]  R. Yerkes,et al.  The relation of strength of stimulus to rapidity of habit‐formation , 1908 .

[36]  Chun Lai,et al.  Enhancing learners’ self-directed use of technology for language learning: the effectiveness of an online training platform , 2016 .

[37]  Paul A. Kirschner,et al.  Towards a Cognitive Theory of Multimedia Assessment (CTMMA) , 2017 .

[38]  Mark L. Berenson,et al.  Proctored versus Unproctored Online Exams: Studying the Impact of Exam Environment on Student Performance. , 2009 .

[39]  Timothy J. Fogarty,et al.  Location does not have to be destiny: student evaluation and integrity controls in a management accounting class , 2012, Behav. Inf. Technol..

[40]  Reeshad S. Dalal,et al.  Validity of Web-Based Psychological Research , 2000 .

[41]  César Domínguez,et al.  Surveying and benchmarking techniques to analyse DNA gel fingerprint images , 2015, Briefings Bioinform..

[42]  Brett Myors,et al.  Internet testing: A natural experiment reveals test score inflation on a high-stakes, unproctored cognitive test , 2009, Comput. Hum. Behav..

[43]  Yi-Hsuan Lee,et al.  A review of recent response-time analyses in educational testing , 2011 .

[44]  M. J. Emerson,et al.  The Unity and Diversity of Executive Functions and Their Contributions to Complex “Frontal Lobe” Tasks: A Latent Variable Analysis , 2000, Cognitive Psychology.

[45]  Walter Kintsch,et al.  Comprehension: A Paradigm for Cognition , 1998 .

[46]  Jessica Rosenberg,et al.  Using the World-Wide Web to obtain large-scale word norms: 190,212 ratings on a set of 2,654 German nouns , 2009, Behavior research methods.

[47]  Ulf-Dietrich Reips Standards for Internet-based experimenting. , 2002, Experimental psychology.

[48]  Martin Corley,et al.  Timing accuracy of Web experiments: A case study using the WebExp software package , 2009, Behavior research methods.

[49]  Giordano B. Beretta,et al.  Web-based versus controlled environment psychophysics experiments , 2007, Electronic Imaging.

[50]  Curtis J. Bonk,et al.  The Handbook of Blended Learning: Global Perspectives, Local Designs , 2005 .

[51]  John Sweller,et al.  Cognitive Load Theory , 2020, Encyclopedia of Education and Information Technologies.

[52]  F. Drasgow,et al.  Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. , 1993 .

[53]  Ana Sánchez,et al.  A comparative analysis of the consistency and difference among online self-, peer-, external- and instructor-assessments: The competitive effect , 2016, Comput. Hum. Behav..

[54]  Rebecca D. Hetter,et al.  A Comparison of Item Calibration Media in Computerized Adaptive Testing , 1994 .

[55]  J. Chall Stages of reading development , 1983 .

[56]  Thom Luce Developing Web Applications with Active Server Pages , 2002 .

[57]  M. Birnbaum Human research and data collection via the internet. , 2004, Annual review of psychology.

[58]  Yair Levy,et al.  Utilizing webcam-based proctoring to deter misconduct in online exams , 2016, Comput. Educ..

[59]  M. Langlois,et al.  Society of Photo-Optical Instrumentation Engineers , 2005 .

[60]  Amol Wagholikar,et al.  The Use of Data Mining to Determine Cheating in Online Student Assessment , 2006, Electronics, Robotics and Automotive Mechanics Conference (CERMA'06).

[61]  Jens D M Rademacher,et al.  Dynamic online surveys and experiments with the free open-source softwaredynQuest , 2007, Behavior research methods.

[62]  Iain Coyne,et al.  The Impact of Mode of Administration on the Equivalence of a Test Battery: A Quasi‐Experimental Design , 2005 .

[63]  Jürgen S. Sauer,et al.  Effects of Perceived Prototype Fidelity in Usability Testing under Different Conditions of Observer Presence , 2013, Interact. Comput..

[64]  R. Mayer,et al.  Multimedia Learning: Frontmatter , 2001 .

[65]  James C. Impara,et al.  Item and Test Development Strategies to Minimize Test Fraud , 2006 .

[66]  D. Faucher,et al.  Academic dishonesty: Innovative cheating techniques and the detection and prevention of them , 2009 .

[67]  Simon P. Tiffin-Richards,et al.  The component processes of reading comprehension in adolescents , 2015 .

[68]  Paul D. Ellis,et al.  The essential guide to effect sizes : statistical power, meta-analysis, and the interpretation of research results , 2010 .

[69]  T. Lumley,et al.  The importance of the normality assumption in large public health data sets. , 2002, Annual review of public health.

[70]  Myong-Hee Ko,et al.  Learner perspectives regarding device type in technology-assisted language learning , 2017 .

[71]  Steven L. Wise,et al.  Taking the Time to Improve the Validity of Low-Stakes Tests: The Effort-Monitoring CBT. , 2006 .

[72]  Angela Brunstein,et al.  DEWEX: A system for designing and conducting Web-based experiments , 2007, Behavior research methods.

[73]  Gregory J. Cizek,et al.  Cheating on Tests : How To Do It, Detect It, and Prevent It , 1999 .

[74]  Diane J. Prince,et al.  Comparisons of Proctored versus Non-Proctored Testing Strategies in Graduate Distance Education Curriculum. , 2011 .

[75]  George M. Chinnery Going to the MALL: Mobile Assisted Language Learning , 2006 .

[76]  Ulf-Dietrich Reips,et al.  WEXTOR: A Web-based tool for generating and visualizing experimental designs and procedures , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[77]  渡部 俊太郎,et al.  SPIE(Society of Photo-Optical Instrumentation Engineers)報告 , 1986 .

[78]  Ronald P. Leow,et al.  Does the medium really matter in L2 development? The validity of CALL research designs , 2014 .

[79]  Klaus J. Templer,et al.  Internet testing: Equivalence between proctored lab and unproctored field conditions , 2008, Comput. Hum. Behav..