Comparison of Normalized Gain and Cohen's "d" for Analyzing Gains on Concept Inventories.

Measuring student learning is a complicated but necessary task for understanding the effectiveness of instruction and issues of equity in college STEM courses. Our investigation focused on the implications on claims about student learning that result from choosing between one of two commonly used methods for analyzing shifts in concept inventories. The methods are: Hake's gain (g), which is the most common method used in physics education research and other discipline based education research fields, and Cohen's d, which is broadly used in education research and many other fields. Data for the analyses came from the Learning Assistant Supported Student Outcomes (LASSO) database and included test scores from 4,551 students on physics, chemistry, biology, and math concept inventories from 89 courses at 17 institutions from across the United States. We compared the two methods across all of the concept inventories. The results showed that the two methods led to different inferences about student learning and equity due to g being biased in favor of high pretest populations. Recommendations for the analysis and reporting of findings on student learning data are included.

[1]  Steven J. Pollock,et al.  Gender disparities in second-semester college physics: The incremental effects of a “smog of bias” , 2010 .

[2]  Eric Brewe,et al.  Toward equity through participation in Modeling Instruction in introductory university physics , 2010 .

[3]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[4]  R. DeShon,et al.  Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. , 2002 .

[5]  Todd E. Bodner,et al.  What Improves with Increased Missing Data Imputations? , 2008 .

[6]  Robert M. Talbot Taking an Item‐Level Approach to Measuring Change With the Force and Motion Conceptual Evaluation: An Application of Item Response Theory , 2013 .

[7]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[8]  Kimberly Tanner,et al.  The Problem of Revealing How Students Think: Concept Inventories and Beyond , 2010, CBE life sciences education.

[9]  Xu Song Gender Differences in Science , 2005 .

[10]  Steven Pollock,et al.  Who Is Responsible for Preparing Science Teachers? , 2006, Science.

[11]  C. Y. Peng,et al.  Principled missing data methods for researchers , 2013, SpringerPlus.

[12]  R. Hake Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses , 1998 .

[13]  W. Dunlap,et al.  Meta-Analysis of Experiments With Matched Groups or Repeated Measures Designs , 1996 .

[14]  Jonathan T. Shemwell,et al.  Gender, Experience, and Self-Efficacy in Introductory Physics. , 2016 .

[15]  Eric Brewe,et al.  Impact of equity models and statistical measures on interpretations of educational reform , 2012 .

[16]  Noah S. Podolefsky,et al.  New Instrument for Measuring Student Beliefs about Physics and Learning Physics: The Colorado Learning Attitudes about Science Survey. , 2006 .

[17]  John L.P. Thompson,et al.  Missing data , 2004, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[18]  Sarah B. McKagan,et al.  Gender gap on concept inventories in physics: what is consistent, what is inconsistent,and what factors influence the gap? , 2013, 1307.0912.

[19]  The Impacts of Learning Assistants on Student Learning of Physics , 2016, 1607.07469.

[20]  Catherine H. Crouch,et al.  Reducing the gender gap in the physics classroom , 2006 .

[21]  Catherine A. Manly,et al.  Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research , 2015 .

[22]  Lei Bao,et al.  Theoretical comparisons of average normalized gain calculations , 2006 .

[23]  D. Hestenes,et al.  Force concept inventory , 1992 .

[24]  Michelle K. Smith,et al.  Active learning increases student performance in science, engineering, and mathematics , 2014, Proceedings of the National Academy of Sciences.

[25]  Shannon D. Willoughby,et al.  Exploring gender differences with different gain calculations in astronomy and biology , 2009 .

[26]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[27]  Dinah Sparks,et al.  Gender Differences in Science, Technology, Engineering, and Mathematics (STEM) Interest, Credits Earned, and NAEP Performance in the 12th Grade. Stats in Brief. NCES 2015-075. , 2015 .

[28]  Lauren E. Kost,et al.  Characterizing the gender gap in introductory physics , 2009 .

[29]  Geoff Potvin,et al.  Beyond performance metrics: Examining a decrease in students’ physics self-efficacy through a social networks lens , 2016, 1809.01552.

[30]  Patricia Goodson,et al.  Out of sight, not out of mind: strategies for handling missing data. , 2008, American journal of health behavior.

[31]  L. Cronbach,et al.  How we should measure "change": Or should we? , 1970 .

[32]  R. Grissom,et al.  Effect Sizes for Research : Univariate and Multivariate Applications, Second Edition , 2005 .

[33]  Eleanor C. Sayre,et al.  Secondary analysis of teaching methods in introductory physics: A 50 k-student study , 2016, 1603.00516.

[34]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[35]  Jeffrey A. Phillips,et al.  Interpreting FCI scores: Normalized gain, preinstruction scores, and scientific reasoning ability , 2005 .

[36]  Daniël Lakens,et al.  Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs , 2013, Front. Psychol..

[37]  F. Ettinger Out of sight. , 1998, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[38]  Mark A. McDaniel,et al.  Multiyear, Multi-Instructor Evaluation of a Large-Class Interactive-Engagement Curriculum. , 2014 .

[39]  Richard P. DeShon,et al.  Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. , 2002, Psychological methods.