Disentangling Setting and Mode Effects for Online Competence Assessment

Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face assessments. Acknowledging the difference between delivery, mode, and test setting, this chapter extends the theoretical background for dealing with mode effects in NEPS competence assessments (Kroehne and Martens in Zeitschrift fur Erziehungswissenschaft 14:169–186, 2011 2011) and discusses two specific facets of UOA: (a) the confounding of selection and setting effects and (b) the role of test-taking behavior as mediator variable. We present a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment. We particularly emphasize the relationship between paradata and the investigation of test-taking behavior, and illustrate how a reference sample formed by competence assessments under standardized and supervised conditions can be used to increase the comparability of UOA in mixed-mode designs. The closing discussion reflects on the trade-off between data quality and the benefits of UOA.

[1]  Birk Diedenhofen,et al.  PageFocus: Using paradata to detect and prevent cheating on online achievement tests , 2017, Behavior research methods.

[2]  S. Wise Effort Analysis: Individual Score Validation of Achievement Test Data , 2015 .

[3]  Ulf-Dietrich Reips Chapter 4 – The Web Experiment Method: Advantages, Disadvantages, and Solutions , 2000 .

[4]  C. Shaw,et al.  Applying an extended theoretical framework for data collection mode to health services research , 2010, BMC health services research.

[5]  Chockalingam Viswesvaran,et al.  Meta-Analyses of Fakability Estimates: Implications for Personality Measurement , 1999 .

[6]  D. Dillman Mail and internet surveys , 1999 .

[7]  Mario Callegaro Do You Know Which Device Your Respondent Has Used to Take Your Online Survey , 2010 .

[8]  Frank Goldhammer,et al.  Invariance of the Response Processes Between Gender and Modes in an Assessment of Reading , 2019, Front. Appl. Math. Stat..

[9]  Rob Kitchin,et al.  What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets , 2016, Big Data Soc..

[10]  F. Preckel,et al.  Online- versus paper-pencil-version of a high potential intelligence test , 2003 .

[11]  Claus H. Carstensen,et al.  Taking the Missing Propensity Into Account When Estimating Competence Scores , 2015, Educational and psychological measurement.

[12]  Sandip Sinharay,et al.  Assessment of Person Fit for Mixed-Format Tests , 2015 .

[13]  B. Finn Measuring Motivation in Low-Stakes Assessments , 2015 .

[14]  Scott T. Frein Comparing In-Class and Out-of-Class Computer-Based Tests to Traditional Paper-and-Pencil Tests in Introductory Psychology Courses , 2011 .

[15]  Deborah L. Schnipke,et al.  Modeling Item Response Times With a Two-State Mixture Model: A New Method of Measuring , 1997 .

[16]  Jason S. Zack,et al.  Online Counseling: A Handbook for Mental Health Professionals , 2003 .

[17]  B. Maddox Talk and Gesture as Process Data , 2017 .

[18]  Joop J. Hox,et al.  Measurement equivalence in mixed mode surveys , 2015, Front. Psychol..

[19]  Scott Fricker,et al.  An Experimental Comparison of Web and Telephone Surveys , 2005 .

[20]  Yi-Hsuan Lee,et al.  Using response time to investigate students' test-taking behaviors in a NAEP computer-based study , 2014, Large-scale Assessments in Education.

[21]  Mary M. Johnston Applying Solution Behavior Thresholds to a Noncognitive Measure to Identify Rapid Responders: An Empirical Investigation , 2016 .

[22]  Christine E. DeMars,et al.  Low Examinee Effort in Low-Stakes Assessment: Problems and Potential Solutions , 2005 .

[23]  Mick P. Couper,et al.  Why Do Web Surveys Take Longer on Smartphones? , 2017 .

[24]  Barry Schouten,et al.  Measurement Effects of Survey Mode on the Equivalence of Attitudinal Rating Scale Questions , 2013 .

[25]  Oliver Lüdtke,et al.  Test-taking engagement in PIAAC , 2016 .

[26]  Tracy L. Tuten,et al.  Classifying Response Behaviors in Web-based Surveys , 2006, J. Comput. Mediat. Commun..

[27]  Geert Molenberghs,et al.  Person fit for test speededness: normal curvatures, likelihood ratio tests and empirical Bayes estimates , 2010 .

[28]  Geert Molenberghs,et al.  A Method for Evaluating Mode Effects in Mixed-mode Surveys , 2010 .

[29]  Does it Matter How Data are Collected? A Comparison of Testing Conditions and the Implications for Validity , 2009 .

[30]  M. Racsmány,et al.  Hungarian Validation of the Penn State Worry Questionnaire (PSWQ) , 2015 .

[31]  Ou Lydia Liu,et al.  Evaluating the Impact of Careless Responding on Aggregated-Scores: To Filter Unmotivated Examinees or Not? , 2017 .

[32]  Kyle C. Huff,et al.  The comparison of mobile devices to computers for web-based assessments , 2015, Comput. Hum. Behav..

[33]  Shelby J. Haberman,et al.  A New Procedure for Detection of Students’ Rapid Guessing Responses Using Response Time , 2016 .

[34]  Pei-Luen Patrick Rau,et al.  Understanding lurkers in online communities: A literature review , 2014, Comput. Hum. Behav..

[35]  Timo Gnambs,et al.  A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments , 2020 .

[36]  Stefan Stieger,et al.  What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study , 2010, Comput. Hum. Behav..

[37]  Mick P. Couper,et al.  A Typology of Web Survey Paradata for Assessing Total Survey Error , 2019 .

[38]  John J. Prindle,et al.  Paper-Based Assessment of the Effects of Aging on Response Time: A Diffusion Model Analysis , 2017, Journal of Intelligence.

[39]  Robin D. Anderson,et al.  Proctors Matter: Strategies for Increasing Examinee Effort on General Education Program Assessments , 2009, The Journal of General Education.

[40]  Ulf Kröhne,et al.  11 Computer-based competence tests in the national educational panel study: The challenge of mode effects , 2011 .

[41]  Peter Lynn,et al.  Assessing the Effect of Data Collection Mode on Measurement , 2010 .

[42]  Vasja Vehovar,et al.  Investigating respondent multitasking in web surveys using paradata , 2016, Comput. Hum. Behav..

[43]  Alper Bayazit,et al.  Performance and duration differences between online and paper–pencil tests , 2012 .

[44]  Kai Kaspar,et al.  Disclosure of sensitive behaviors across self-administered survey modes: a meta-analysis , 2014, Behavior Research Methods.

[45]  Steven L. Wise,et al.  An Application of Item Response Time: The Effort‐Moderated IRT Model , 2006 .

[46]  Natalie Shlomo,et al.  Estimation of an indicator of the representativeness of survey response , 2012 .

[47]  S. Wise,et al.  The Generalizability of Motivation Filtering in Improving Test Score Validity , 2006 .

[48]  B. Csapó,et al.  Computer-Based Assessment of School Readiness and Early Reasoning , 2014 .

[49]  Daniel H. Robinson,et al.  Speed and Performance Differences among Computer-Based and Paper-Pencil Tests , 2004 .

[50]  Joseph A. Rios,et al.  Online Proctored Versus Unproctored Low-Stakes Internet Test Administration: Is There Differential Test-Taking Behavior and Performance? , 2017 .

[51]  Dave Bartram,et al.  Testing on the Internet: Issues, Challenges and Opportunities in the Field of Occupational Assessment , 2008 .

[52]  Wim Bloemers,et al.  Cheating on Unproctored Internet Intelligence Tests: Strategies and Effects , 2016 .

[53]  Filip Lievens,et al.  Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large‐scale operational test program , 2011 .

[54]  Ping Wan,et al.  Assessing Individual-Level Impact of Interruptions during Online Testing. , 2015 .

[55]  Penny Black,et al.  Straightlining: Overview of Measurement, Comparison of Indicators, and Effects in Mail–Web Mixed-Mode Surveys , 2019 .

[56]  Jelke Bethlehem,et al.  Indicators for the representativeness of survey response , 2009 .

[57]  S. Wise,et al.  Setting Response Time Thresholds for a CAT Item Pool: The Normative Threshold Method , 2012 .

[58]  Lara B. Russell,et al.  Some Thoughts on Gathering Response Processes Validity Evidence in the Context of Online Measurement and the Digital Revolution , 2017 .

[59]  Johannes Naumann,et al.  Relating Product Data to Process Data from Computer-Based Competency Assessment , 2017 .

[60]  Frank Goldhammer,et al.  The Transition to Computer-Based Testing in Large-Scale Assessments: Investigating (Partial) Measurement Invariance between Modes , 2016 .

[61]  A. Ryan,et al.  Mobile Internet Testing: An Analysis of Equivalence, Individual Differences, and Reactions , 2015 .

[62]  Cornelis A.W. Glas,et al.  A Bayesian Approach to Person Fit Analysis in Item Response Theory Models , 2003 .

[63]  Sandip Sinharay,et al.  Determining the Overall Impact of Interruptions During Online Testing , 2014 .

[64]  Frauke Kreuter,et al.  Improving Surveys with Paradata: Analytic Uses of Process Information , 2013 .

[65]  Joop J. Hox,et al.  Mode effect or question wording? Measurement error in mixed mode surveys , 2015 .

[66]  Joseph A. Rios,et al.  Identifying Low-Effort Examinees on Student Learning Outcomes Assessment: A Comparison of Two Approaches , 2014 .

[67]  Frank Goldhammer,et al.  Measuring Ability, Speed, or Both? Challenges, Psychometric Solutions, and What Can Be Gained From Experimental Control , 2015, Measurement : interdisciplinary research and perspectives.

[68]  Richard Sale,et al.  International Guidelines on Computer-Based and Internet-Delivered Testing: A Practitioner's Perspective , 2006 .

[69]  R. Bennett Online Assessment and the Comparability of Score Meaning , 2003 .

[70]  Heiko Rölke The ItemBuilder: A Graphical Authoring System for Complex Item Development , 2012 .

[71]  Steven L. Wise,et al.  Setting the Response Time Threshold Parameter to Differentiate Solution Behavior From Rapid-Guessing Behavior , 2007 .

[72]  Neil A. Morelli,et al.  Internet-Based, Unproctored Assessments on Mobile and Non-Mobile Devices: Usage, Measurement Equivalence, and Outcomes , 2015 .

[73]  Steven L. Wise,et al.  Response Time Effort: A New Measure of Examinee Motivation in Computer-Based Tests , 2005 .

[74]  F. Schmiedek,et al.  Cognitive benefits of last night's sleep: daily variations in children's sleep behavior are related to working memory fluctuations. , 2015, Journal of child psychology and psychiatry, and allied disciplines.

[75]  Victor M. H. Borden,et al.  The Effects of Motivational Instruction on College Students' Performance on Low-Stakes Assessment , 2015 .