Measuring the effectiveness of pedagogical innovations using multiple baseline testing

A great deal of literature focuses on innovations that are designed to improve educational performance. Although some innovations are designed and implemented to address learning in a very specific domain, others influence student learning more generally as they are applicable regardless of specific content (e.g., mechanisms for delivering new content, new strategies for student-student interactions, and application of new technologies). Many instructors form the hypothesis that a particular innovation will enhance student learning and, consequently, the ability to achieve desired learning objectives. Testing such a hypothesis can be troublesome when confounding factors exist in the student body’s learning environment such as scheduled breaks, social stressors, and activities occurring in other courses. Multiple baseline testing is a promising strategy for statistically controlling the influence of confounding factors when innovations are implemented consistently across multiple groups of students. This strategy involves measuring student performance, implementing the innovation at a randomly selected time, and continuing to measure student performance as the innovation is integrated within the course. The impact of the innovation treatment can be measured using time series regression. This paper presents the proper mechanics of multiple baseline testing, discusses the relatively small body of research on this method that exists outside the medical and biological fields, and provides clear recommendations for managing threats to validity in engineering education research. Introduction In much of pedagogy literature authors attempt to describe a pedagogical innovation and demonstrate its impact on student learning. These studies include qualitative measurements of improvement such as student feedback in learning logs 1 and quantitative measurements such as performance on examinations . The vast majority of researchers assess the impacts of new teaching methods primarily using correlational or comparative studies. They often gather empirical data to understand if there is an improvement combined with qualitative feedback in student reflections to understand why the intervention was successful or unsuccessful. Nearly all of these pedagogical studies aim to measure the improvement in learning resulting from an intervention. These studies essentially aim to perform a hypothesis test (i.e., testing to see if the implementation of intervention X yields a statistically significant improvement in achievement of learning objectives) to infer causal relationships. The problem with such causality inference approaches is that these methods can inherently be susceptible to limitations in internal and external validity as there are numerous confounding factors that may influence achievement of learning objectives including instructor effectiveness, social stress, time of the year, and others. Although several correlational studies have claimed to indicate causal relationships in education research , several researchers rightly question the legitimacy of such studies . According to these researchers, a causal inference can only be inferred if the following criteria are warranted:  Sufficient evidence that the effect or outcome variable occurs as a consequence of introducing a specific treatment variable; P ge 23891.2  Clear indication of the absence of any alternate plausible explanation for the effect observed; and  Evidence that the causal factor or treatment variable precedes the occurrence of the observed effect In light of the above requirements, researchers posit that correlation based cross-sectional research that measures outcome variables at a single point in time inherently fails to provide adequate evidence for causal inference. In fact, it is impossible to provide evidence to assert that the causal factor preceded the occurrence of the observed effect . Also, such studies do not adequately control for extraneous or alternate plausible explanations for the observed effect . Ironically, results from a 2004 survey of five teaching and learning journals by Robinson et al. (2007) indicate that 43% of non-intervention studies contained causal statements. Such trends have led Hsieh et al. (2005), Seethaler and Fuchs (2005), and Robinson et al. (2007) to express concern with research rigor. They encourage education researchers to reinvigorate their intervention research undertakings. Fortunately, there are experimental and quasi-experimental methods that can achieve validity and should be used to make valid causal inferences. As noted by Thompson et al. (2005), randomized controlled intervention experiments are a requirement for providing definitive answers to causal questions. Randomized controlled intervention studies are true experiments in which subjects are randomly assigned to at least two conditions, namely the intervention or treatment group and a control group. The researcher intentionally manipulates or introduces the treatment variable to the intervention group . The control group, which does not receive the intervention, is compared to the treatment group to compute effects of the independent variable. Accordingly, causal inferences based on the difference in the observed outcome between the treatment and the control group can be attributed to the intervention. As such, randomized controlled intervention studies systematically account for or eliminate alternate plausible explanations enabling definitive casual inferences . In educational research, contrary to intuition, it has been established that the number of articles based on randomized experiments have considerable declined over the years. According to Hsieh et al. (2005) , the results from surveying 4 educational journals indicate that the percentage of educational articles featuring randomized experiments decreased from 47% in 1983 to 34% in 1995, and to only 26% in 2004. In another study conducted by Snyder et al. (2002) , in a review of 450 group quantitative studies, only 10% represented randomized controlled experiments. This decline in randomized experiment studies may partly be attributed to the following factors: (1) randomized designs rarely duplicate real-life situations ; (2) practical conditions for randomized experiments are generally not satisfied ; (3) the randomization process may be especially challenging in an educational setting where study groups may not be altered to form comparable intervention and control groups; and (4) ethical considerations emerge when a promising or potential educational intervention is provided to the intervention group while the control group is denied of its benefits . Interestingly, the decline in proportion of experimental education studies has occurred despite the fact that several legislations (e.g. No Child Left Behind NCLB 2001) and authors have elevated randomized experiments as being the “gold standard” for conducting scientifically credible research . P ge 23891.3 One major reason for the decline in the number of intervention studies is the perception among researchers that the required methodological rigor to conduct scientifically credible conclusions is impractical in an educational setting . As discussed, however, correlation-based studies have been criticized because definitive causal inferences cannot be established. Therefore, there is an imminent need in the field of educational research to understand how to conduct rigorous research that yields valid causal interferences that is also practical. A method with great potential in the pedagogical domain for experimental research is multiple baseline testing (MBT). This experimental technique allows a researcher to conduct a controlled and internally valid experiment when a longitudinal assessment strategy is practical. Although MBT is time intensive, the method is rigorous because its inherent structure limits threats to validity and reliability and allows the researcher to make valid causal inferences . This highly potential research design remains underused despites its ability to produce scientifically reinforced results in educational research . The objective of this paper is to describe the MBT method, how to form hypotheses that are appropriate for MBT, how to structure a proper MBT experiment, methods for promoting validity and reliability during the MBT process, proper statistical approaches for time series data subject to autocorrelation. We present this guidance in the context of six experiments conducted in professional research and two experiments conducted in the classroom. We expect that the guidance provided can be used by future investigators to increase the rigor of their pedagogical research and to serve as a foundation for experimental research for establishing causal relationships. At the present time, there is no singular resource for the proper use of MBT for educational research despite its utility, practicality, and rigor for drawing causal inferences regarding improvements resulting from pedagogical innovations. Thus, this paper should be of interest for researchers across all pedagogical domains. Background and Rationale of MBT The MBT design methodology was first introduced in the Journal of Applied Behavior and Analysis by Baer et al. (1968) 23 where the authors argued that the effects of experimental manipulations, if any, could be definitively illustrated with the MBT structure (as cited in . Since then, several research methodologists have recommended the use of the MBT design to evaluate the effectiveness of interventions in various fields . The medical and biological fields in particular have realized the benefit of MBT . Unfortunately, it remains underused in education research . Several methods exist to test the effects of pedagogical interventions. Since the implementation of true randomized experiments that deprive the control group of potential interventions is often considered unethical , researchers a

[1]  D M Baer,et al.  Some current dimensions of applied behavior analysis. , 1968, Journal of applied behavior analysis.

[2]  R. Kirk Experimental Design: Procedures for the Behavioral Sciences , 1970 .

[3]  Philip Lambert,et al.  The Journal of Experimental Education , 1970 .

[4]  Alan E. Kazdin,et al.  On resolving ambiguities of the multiple-baseline design: Problems and recommendations , 1975 .

[5]  D. Barlow,et al.  Single Case Experimental Designs: Strategies for Studying Behavior Change , 1976 .

[6]  Bradley E. Huitema,et al.  The analysis of covariance and alternatives , 1980 .

[7]  P J Watson,et al.  The non-concurrent multiple baseline across-individuals design: an extension of the traditional multiple baseline design. , 1981, Journal of behavior therapy and experimental psychiatry.

[8]  M. Patton,et al.  Evaluation: A Systematic Approach , 1980 .

[9]  J. Cooper,et al.  Applied behavior analysis in education , 1982 .

[10]  M. E. Boyle Single Case Experimental Designs: Strategies for Studying Behavior Change , 1983 .

[11]  Malcolm R. McNeil,et al.  Apraxia of speech : physiology, acoustics, linguistics, management , 1984 .

[12]  Karl R. White,et al.  Ethical, Practical, and Scientific Considerations of Randomized Experiments in Early Childhood Special Education , 1986 .

[13]  John B. Willett,et al.  Some Results on Reliability for the Longitudinal Measurement of Change: Implications for the Design of Studies of Individual Growth , 1989 .

[14]  James M. Johnston,et al.  Strategies and tactics of behavioral research , 1993 .

[15]  K. Ottenbacher,et al.  The statistical analysis of single-subject data: a comparative examination. , 1994, Physical therapy.

[16]  Miriam Lenehan,et al.  Effects of learning-style intervention on college students' achievement, anxiety, anger, and curiosity. , 1994 .

[17]  P. Diggle Analysis of Longitudinal Data , 1995 .

[18]  Matthew J. Koehler,et al.  Regulated Randomization: A Potentially Sharper Analytical Tool for the Multiple-Baseline Design , 1998 .

[19]  J. McKean,et al.  Design Specification Issues in Time-Series Intervention Models , 2000 .

[20]  Helen Lingard The effect of first aid training on objective safety behaviour in Australian small business construction firms , 2001 .

[21]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[22]  Linda M. Goldenhar,et al.  Guide to Evaluating the Effectiveness of Strategies for Preventing Work Injuries: How to Show Whether a Safety Intervention Really Works , 2001 .

[23]  F. Mosteller,et al.  Evidence matters : randomized trials in education research , 2002 .

[24]  Patricia Snyder,et al.  Examination of Quantitative Methods Used in Early Intervention Research: Linkages With Recommended Practices , 2002 .

[25]  Geoffrey D. Borman Experiments for Educational Evaluation and Improvement , 2002, Peabody Journal of Education.

[26]  Craig R Ramsay,et al.  INTERRUPTED TIME SERIES DESIGNS IN HEALTH TECHNOLOGY ASSESSMENT: LESSONS FROM TWO SYSTEMATIC REVIEWS OF BEHAVIOR CHANGE STRATEGIES , 2003, International Journal of Technology Assessment in Health Care.

[27]  Margaret T May,et al.  Statistical Methods for the Analysis of Repeated Measurements.Charles S Davis. Heidelberg: Springer Verlag, 2002, pp. 415, £59.50 (HB) ISBN: 0-387-95370-1. , 2003 .

[28]  J. Singer,et al.  Applied Longitudinal Data Analysis , 2003 .

[29]  James R. Kenyon,et al.  Statistical Methods for the Analysis of Repeated Measurements , 2003, Technometrics.

[30]  K. E. Barron,et al.  Testing Moderator and Mediator Effects in Counseling Psychology Research. , 2004 .

[31]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .

[32]  Jesse L. M. Wilkins,et al.  Mathematics and Science Self-Concept: An International Investigation , 2004 .

[33]  Alexander C. Wagenaar,et al.  The Value of Interrupted Time-Series Experiments for Community Intervention Research , 2000, Prevention Science.

[34]  Tim Urdan,et al.  Predictors of Academic Self-Handicapping and Achievement: Examining Achievement Goals, Classroom Goal Structures, and Culture. , 2004 .

[35]  Craig H. Kennedy,et al.  Nonconcurrent Multiple Baseline Designs and the Evaluation of Educational Systems , 2004 .

[36]  Helen Lingard,et al.  Occupational health and safety in construction project management , 2004 .

[37]  Lynn S. Fuchs,et al.  A Drop in the Bucket: Randomized Controlled Trials Testing Reading and Math Interventions , 2005 .

[38]  John T. E. Richardson,et al.  Instruments for obtaining student feedback: a review of the literature , 2005 .

[39]  Joel R. Levin,et al.  Randomized Classroom Trials on Trial , 2005 .

[40]  Daniel H. Robinson,et al.  Is Educational Intervention Research on the Decline , 2005 .

[41]  Patricia Snyder,et al.  Evaluating the Quality of Evidence from Correlational Research for Evidence-Based Practice , 2005 .

[42]  Daniel H. Robinson,et al.  Empirical methods for evaluating educational interventions , 2005 .

[43]  Matt Tincani,et al.  The Picture Exchange Communication System: Effects on Manding and Speech Development for School-Aged , 2006 .

[44]  Joseph W. McKean,et al.  Identifying Autocorrelation Generated by Various Error Processes in Interrupted Time-Series Regression Designs , 2007 .

[45]  R. B. Johnson,et al.  Educational Research: Quantitative, Qualitative, and Mixed Approaches , 2007 .

[46]  Sharon Vaughn,et al.  The Incidence of “Causal” Statements in Teaching-and-Learning Research Journals , 2007 .

[47]  Anthony Shakeshaft,et al.  The multiple baseline design for evaluating population-based research. , 2007, American journal of preventive medicine.

[48]  Bernard C. Beins Research Methods: A Tool for Life , 2008 .

[49]  David Morgan,et al.  Single-Case Research Methods for the Behavioral and Health Sciences , 2008 .

[50]  Phyllis Solomon,et al.  Randomized Controlled Trials , 2009 .

[51]  Shelley L. Leininger,et al.  Single Subject Designs in Biomedicine , 2009 .

[52]  이수정 해외산업간호정보 - 미국 산업안전보건연구원(National Institute for Occupational Safety and Health) 소개 , 2009 .

[53]  David L. Gast,et al.  Single subject research methodology in behavioral sciences , 2010 .

[54]  Alan E. Kazdin,et al.  Single-Case Research Designs: Methods for Clinical and Applied Settings , 2010 .

[55]  R. D. de Bie,et al.  Feasibility and potential effectiveness of a non-pharmacological multidisciplinary care programme for persons with generalised osteoarthritis: a randomised, multiple-baseline single-case study , 2012, BMJ Open.

[56]  Stephen B. Richards,et al.  Single Subject Research: Applications in Educational and Clinical Settings , 2013 .