Revisiting field experimentation: field notes for the future.

Field experiments in the social sciences were increasingly used in the 20th century. This article briefly reviews some important lessons in design, analysis, and theory of field experiments emerging from that experience. Topics include the importance of ensuring that selection into experiments and assignment to conditions occurs properly, how to prevent and analyze attrition, the need to attend to power and effect size, how to measure and take partial treatment implementation into account in analyses, modern analyses of quasi-experimental and multilevel data, Rubin's model, and the role of internal and external validity. The article ends with observations on the computer revolution in methodology and statistics, convergences in theory and methods across disciplines, the need for an empirical program of methodological research, the key problem of selection bias, and the inevitability of increased specialization in field experimentation in the years to come.

[1]  William Anderson McCall How to experiment in education , 1923 .

[2]  R. Boruch,et al.  Randomization and field experimentation , 1985 .

[3]  Donald T. Campbell,et al.  Relabeling Internal and External Validity for Applied Social Scientists. , 1986 .

[4]  Daniel Katz,et al.  Research Methods in the Behavioral Sciences. , 1954 .

[5]  Victor L. Willson,et al.  A Meta-analysis of Pretest Sensitization Effects in Experimental Design , 1982 .

[6]  M. Cowles Statistics in Psychology: An Historical Perspective , 1989 .

[7]  S D Imber,et al.  National Institute of Mental Health Treatment of Depression Collaborative Research Program. General effectiveness of treatments. , 1989, Archives of general psychiatry.

[8]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[9]  Allan Donner,et al.  Design and Analysis of Cluster Randomization Trials in Health Research , 2001 .

[10]  Stephen G. West,et al.  Causal inference and generalization in field settings: Experimental and quasi-experimental designs. , 2000 .

[11]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[12]  Frederick Mosteller,et al.  Representative Sampling, I: Non-Scientific Literature , 1979 .

[13]  A. Costello,et al.  Missing data in psychiatric research: a solution. , 1983, Psychological bulletin.

[14]  S. Raudenbush,et al.  Comparing personal trajectories and drawing causal inferences from longitudinal data. , 2001, Annual review of psychology.

[15]  L. Festinger,et al.  When Prophecy Fails , 1956 .

[16]  Frederick Mosteller,et al.  Representative Sampling, II: Scientific Literature, Excluding Statistics , 1979 .

[17]  William R. Shadish,et al.  Evaluation studies : review annual , 1976 .

[18]  W. Trochim,et al.  Advances in Quasi-Experimental Design and Analysis , 1986 .

[19]  Christopher Winship,et al.  Models for Sample Selection Bias , 1992 .

[20]  R. Littell SAS System for Mixed Models , 1996 .

[21]  Donald T. Campbell,et al.  A Primer on Regression Artifacts , 1999 .

[22]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[23]  D. Campbell,et al.  EXPERIMENTAL AND QUASI-EXPERIMENT Al DESIGNS FOR RESEARCH , 2012 .

[24]  J. Mackie,et al.  The cement of the universe : a study of causation , 1977 .

[25]  W. G. Cochran The Planning of Observational Studies of Human Populations , 1965 .

[26]  R. Boruch,et al.  Seven kinds of randomization plans for designing field experiments , 1985 .

[27]  D M Murray,et al.  An Evaluation of Analysis Options for the One-Group-Per-Condition Design , 2001, Evaluation review.

[28]  D. Rubin Statistics and Causal Inference: Comment: Which Ifs Have Causal Answers , 1986 .

[29]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[30]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[31]  W J Shih,et al.  Testing for treatment differences with dropouts present in clinical trials--a composite approach. , 1997, Statistics in medicine.

[32]  D. Campbell Reforms as experiments , 1969 .

[33]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[34]  Rory A. Fisher,et al.  The Arrangement of Field Experiments , 1992 .

[35]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[36]  K. Schaie,et al.  Methodological issues in aging research , 1990 .

[37]  Donald T. Campbell,et al.  Methods for the experimenting society , 1991 .

[38]  Paul W. Holland,et al.  Comment: It's Very Clear , 1989 .

[39]  W. Shadish,et al.  Design rules: More steps towards a complete theory of quasi-experimentation , 1999 .

[40]  Donald B. Rubin,et al.  Interpersonal expectancy effects: the first 345 studies , 1978, Behavioral and Brain Sciences.

[41]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[42]  G. Cain,et al.  Program Applicants As a Comparison Group in Evaluating Training Programs. , 1997 .

[43]  Larry L. Orr,et al.  Social Experiments: Evaluating Public Programs With Experimental Methods , 1998 .

[44]  L. Irwig,et al.  A randomized controlled trial of compliance improving strategies in Soweto hypertensives. , 1991, Medical care.

[45]  J. Heckman,et al.  The Economics and Econometrics of Active Labor Market Programs , 1999 .

[46]  R. A. Fisher,et al.  Design of Experiments , 1936 .

[47]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[48]  W. Shadish,et al.  Social Experiments: Some Developments over the Past Fifteen Years , 1994 .

[49]  W. Shadish,et al.  Foundations of Program Evaluation: Theories of Practice , 1990 .

[50]  T. Beauchamp Philosophical problems of causation , 1974 .

[51]  P. Rosenbaum Discussing hidden bias in observational studies. , 1991, Annals of internal medicine.

[52]  Frederick Mosteller,et al.  Representative Sampling, III: The Current Statistical Literature , 1979 .

[53]  S. Zeger,et al.  On estimating efficacy from clinical trials. , 1991, Statistics in medicine.

[54]  Cris M. Sullivan,et al.  Retaning Participants in Longitudinal Community Research: A Comprehensive Protocol , 1996 .

[55]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[56]  D. Campbell,et al.  Evolving Methods for Enhancing Validity@@@Methodology and Epistemology for Social Science: Selected Papers , 1990 .

[57]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[58]  Linda M. Collins,et al.  New methods for the analysis of change , 2001 .

[59]  W. Shadish,et al.  Outcome, attrition, and family-couples treatment for drug abuse: a meta-analysis and review of the controlled, comparative studies. , 1997, Psychological bulletin.

[60]  Jan de Leeuw,et al.  Introducing Multilevel Modeling , 1998 .

[61]  William R. Shadish,et al.  Quasi‐experimentation in a critical multiplist mode , 1986 .

[62]  R. Reischauer Poverty Policy and Poverty Research: The Great Society and the Social Sciences , 1987 .

[63]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[64]  K. Ribisl,et al.  Minimizing participant attrition in panel studies through the use of effective retention and tracking strategies: Review and recommendations , 1996 .

[65]  A. A. Lumsdaine,et al.  Evaluation and Experiment: Some Critical Issues in Assessing Social Programs , 1976 .

[66]  Charles S. Reichardt,et al.  Satisfying the constraints of causal modeling , 1986 .

[67]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[68]  R. Little,et al.  Statistical Techniques for Analyzing Data from Prevention Trials: Treatment of No-Shows Using Rubin's Causal Model , 1998 .

[69]  Nicholas J. Horton,et al.  Multiple Imputation in Practice , 2001 .

[70]  Charles S. Reichardt,et al.  Taking uncertainty into account when estimating effects , 1987 .

[71]  J. Heckman Sample selection bias as a specification error , 1979 .

[72]  L. Harlow,et al.  What if there were no significance tests , 1997 .

[73]  Donald T. Campbell,et al.  Regression artifacts in time-series and longitudinal data , 1996 .

[74]  Thomas D. Cook,et al.  The generalization of causal connections: Multiple theories in search of clear practice , 1990 .

[75]  Bengt Muthén,et al.  On structural equation modeling with data that are not missing completely at random , 1987 .

[76]  Xiangen Hu,et al.  A method for exploring the effects of attrition in randomized experiments with dichotomous outcomes , 1998 .

[77]  Zita M. Cantwell,et al.  Research methodology: Strengthening causal interpretations of nonexperimental data: L. Sechrest, E. Perrin, and J. Bunker (Eds.). (AHCPR Conference Proceedings, Tucson, AZ, April, 1987). Washington, DC: U.S. Department of Health and Human Services, Public Health Service, Agency for Health Care Polic , 1992 .

[78]  J. Neyman,et al.  Statistical Problems in Agricultural Experimentation , 1935 .

[79]  Gary King,et al.  AMELIA: A Program for Missing Data (software) , 1999 .

[80]  Richard A. Berk,et al.  When random assignment fails: Some lessons from the Minneapolis Spouse Abuse Experiment , 1988 .

[81]  W. Shadish,et al.  Content and context: The impact of Campbell and Stanley. , 2003 .

[82]  Melvin M. Mark,et al.  Validity typologies and the logic and practice of quasi‐experimentation , 1986 .

[83]  J. Weisz,et al.  The lab versus the clinic. Effects of child and adolescent psychotherapy. , 1992, The American psychologist.

[84]  D. Rubin Comment: Which Ifs Have Causal Answers , 1986 .

[85]  Howard S. Bloom,et al.  Accounting for No-Shows in Experimental Evaluation Designs , 1984 .

[86]  P. Allison Multiple Imputation for Missing Data , 2000 .

[87]  L. Festinger,et al.  Research Methods in the Behavioral Sciences. , 1954 .

[88]  R. Boruch,et al.  3 – Making the Case for Randomized Assignment to Treatments by Considering the Alternatives: Six Ways in Which Quasi-Experimental Evaluations In Compensatory Education Tend to Underestimate Effects , 1975 .

[89]  S. Maxwell Longitudinal designs in randomized group comparisons : When will intermediate observations increase statistical power ? , 1998 .

[90]  K. Delucchi Methods for the analysis of binary outcome results in the presence of missing data. , 1994, Journal of consulting and clinical psychology.

[91]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[92]  D. Rubin,et al.  Estimating and Using Propensity Scores with Partially Missing Data , 2000 .

[93]  S G West,et al.  Putting the individual back into individual growth curves. , 2000, Psychological methods.

[94]  H. Kraemer,et al.  A strategy to use soft data effectively in randomized controlled clinical trials. , 1989, Journal of consulting and clinical psychology.

[95]  Lessons From the Rockefeller Foundation's Experiments On the Minority Female Single Parent Program , 1988 .

[96]  A. Dawid Causal Inference without Counterfactuals , 2000 .

[97]  Jacob Cohen The earth is round (p < .05) , 1994 .

[98]  Kurt Lewin,et al.  Resolving social conflicts : selected papers on group dynamics , 1948 .

[99]  R. Digiuseppe,et al.  Statistical methods for analyses of incomplete clinical service records: concurrent use of longitudinal and cross-sectional data. , 1993, Journal of consulting and clinical psychology.

[100]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[101]  Clifford C. Clogg,et al.  Handbook of statistical modeling for the social and behavioral sciences , 1995 .

[102]  Robert P. Abelson,et al.  On the Surprising Longevity of Flogged Horses: Why There Is a Case for the Significance Test , 1997 .

[103]  G. Hartmann A field experiment on the comparative effectiveness of "emotional" and "rational" political leaflets in determining election results. , 1936 .

[104]  William R. Shadish,et al.  Comment—Design rules: More steps toward a complete theory of quasi-experimentation , 1999 .

[105]  J. Finn,et al.  Answers and Questions About Class Size: A Statewide Experiment , 1990 .

[106]  David Rogosa,et al.  Myths about longitudinal research. , 1988 .

[107]  P. Holland Statistics and Causal Inference , 1985 .

[108]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[109]  David M. Murray,et al.  Design and Analysis of Group- Randomized Trials , 1998 .

[110]  Robert S. Barcikowski,et al.  Statistical Power with Group Mean as the Unit of Analysis , 1981 .

[111]  Paul W. Holland,et al.  Choosing Among Alternative Nonemperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training: Comment , 1989 .

[112]  G. Gigerenzer,et al.  Do studies of statistical power have an effect on the power of studies , 1989 .

[113]  L. Cronbach,et al.  Toward Reform of Program Evaluation , 1981 .

[114]  J. Rossi,et al.  Statistical power of articles published in three health psychology-related journals. , 2001, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[115]  D. Campbell Factors relevant to the validity of experiments in social settings. , 1957, Psychological bulletin.