论文信息 - How do professors format exams?: an analysis of question variety at scale - 字舞流文

How do professors format exams?: an analysis of question variety at scale

This study analyzes the use of paper exams in college-level STEM courses. It leverages a unique dataset of nearly 1,800 exams, which were scanned into a web application, then processed by a team of annotators to yield a detailed snapshot of the way instructors currently structure exams. The focus of the investigation is on the variety of question formats, and how they are applied across different course topics. The analysis divides questions according to seven top-level categories, finding significant differences among these in terms of positioning, use across subjects, and student performance. The analysis also reveals a strong tendency within the collection for instructors to order questions from easier to harder. A linear mixed effects model is used to estimate the reliability of different question types. Long writing questions stand out for their high reliability, while binary and multiple choice questions have low reliability. The model suggests that over three multiple choice questions, or over five binary questions, are required to attain the same reliability as a single long writing question. A correlation analysis across seven response types finds that student abilities for different questions types exceed 70 percent for all pairs, although binary and multiple-choice questions stand out for having unusually low correlations with all other question types.

Marti A. Hearst | Paul Laskowski | Sergey Karayev | S. Karayev | Paul Laskowski | Sergey Karayev

[1] C. Hoyt. Test reliability estimated by analysis of variance , 1941 .

[2] L. Cronbach. An experimental comparison of the multiple true-false and multiple multiple-choice tests. , 1941 .

[3] L. Cronbach. Coefficient alpha and the internal structure of tests , 1951 .

[4] Why and How , 1965, Exceptional children.

[5] R. N. Marso. Test Item Arrangement, Testing Time, and Performance. , 1970 .

[6] R. L. Ebel,et al. Essentials of educational measurement , 1972 .

[7] J. Rich. That Was A Good Test , 1972 .

[8] D. Frisbie. Multiple Choice Versus True-False: A Comparison of Reliabilities and Concurrent Validities. , 1973 .

[9] R. Hambleton,et al. The Effects of Item Order on Test Performance and Stress , 1974 .

[10] R. L. Ebel. CAN TEACHERS WRITE GOOD TRUE‐FALSE TEST ITEMS? , 1975 .

[11] S. Lowry,et al. EFFECTS OF ITEM ARRANGEMENT, KNOWLEDGE OF ARRANGEMENT TEST ANXIETY AND SEX ON TEST PERFORMANCE , 1982 .

[12] Rondeau G. Laffitte. Effects of Item Order on Achievement Test Scores and Students’ Perception of Test Difficulty , 1984 .

[13] Rowland C. Chidomere. Test Item Arrangement and Student Performance in Principles of Marketing Examination: A Replication Study , 1989 .

[14] H. Wainer,et al. Are Tests Comprising Both Multiple‐Choice and Free‐Response Items Necessarily Less Unidimensional Than Multiple‐Choice Tests?An Analysis of Two Tests , 1994 .

[15] B. Bridgeman,et al. Success in college for students with discrepancies between performance on multiple-choice and essay tests. , 1996 .

[16] N. E. Gronlund. Assessment of Student Achievement. Sixth Edition. , 1998 .

[17] K. Scouller. The influence of assessment method on students' learning approaches: Multiple choice question examination versus assignment essay , 1998 .

[18] W. Becker,et al. The Relationship between Multiple Choice and Essay Response Questions in Assessing Economics Understanding , 1999 .

[19] D. Nicol. E‐assessment by design: using multiple‐choice tests to good effect , 2007 .

[20] Deborah Camp,et al. Rethinking Dyslexia, Scripted Reading, and Federal Mandates: The More Things Change, the More They Stay the Same , 2007 .

[21] E. Palmer,et al. Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple choice questions? Research paper , 2007, BMC medical education.

[22] T. Pettijohn,et al. Multiple-Choice Exam Question Order Influences on Student Performance, Completion Time, and Perceptions. , 2007 .

[23] Erik Meijer,et al. Resampling Multilevel Models , 2008 .

[24] Dominique P. Rauch,et al. Multiple-choice versus open-ended response formats of reading test items: A two-dimensional IRT analysis , 2010 .

[25] D. Bates,et al. Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.