A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark

In recent years, social scientists have increasingly turned to matching as a method for drawing causal inferences from observational data. Matching compares those who receive a treatment to those with similar background attributes who do not receive a treatment. Researchers who use matching frequently tout its ability to reduce bias, particularly when applied to data sets that contain extensive background information. Drawing on a randomized voter mobilization experiment, the authors compare estimates generated by matching to an experimental benchmark. The enormous sample size enables the authors to exactly match each treated subject to 40 untreated subjects. Matching greatly exaggerates the effectiveness of pre-election phone calls encouraging voter participation. Moreover, it can produce nonsensical results: Matching suggests that another pre-election phone call that encouraged people to wear their seat belts also generated huge increases in voter turnout. This illustration suggests that caution is warranted when applying matching estimators to observational data, particularly when one is uncertain about the potential for biased inference.

[1]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[2]  David W. Nickerson,et al.  Getting Out the Vote in Local Elections: Results from Six Door-to-Door Canvassing Experiments , 2003 .

[3]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[4]  D. Almirall,et al.  Do CRM Systems Cause One-to-One Marketing Effectiveness? , 2006, math/0609199.

[5]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[6]  D. Rubin,et al.  The bias due to incomplete matching. , 1983, Biometrics.

[7]  T. VanderWeele The use of propensity score methods in psychiatric research , 2006, International journal of methods in psychiatric research.

[8]  D. Freedman Statistical Models for Causation , 2006, Evaluation review.

[9]  James J. Heckman,et al.  Characterizing Selection Bias Using Experimental Data , 1998 .

[10]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[11]  T. Shakespeare,et al.  Observational Studies , 2003 .

[12]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[13]  Ernest Sergenti,et al.  Do UN Interventions Cause Peace? Using Matching to Improve Causal Inference , 2008 .

[14]  Peter M. Steiner,et al.  Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments , 2008 .

[15]  J J Heckman,et al.  Sources of selection bias in evaluating social programs: an interpretation of conventional measures and evidence on the effectiveness of matching as a program evaluation method. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[16]  David C. King,et al.  Partisan Mobilization Campaigns in the Field: Results from a Statewide Turnout Experiment in Michigan , 2006 .

[17]  David W. Nickerson Quality Is Job One: Professional and Volunteer Voter Mobilization Calls , 2007 .

[18]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[19]  Matthew Gentzkow,et al.  Television and Voter Turnout , 2005 .

[20]  David W. Nickerson Volunteer Phone Calls Can Increase Turnout , 2006 .

[21]  D. Green,et al.  The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment , 2000, American Political Science Review.

[22]  Markus Frlich,et al.  Finite-Sample Properties of Propensity-Score Matching and Weighting Estimators , 2004, Review of Economics and Statistics.

[23]  D. Green,et al.  Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment , 2006, Political Analysis.

[24]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[25]  Henry E. Brady,et al.  Voice and Equality: Civic Voluntarism in American Politics , 1996 .

[26]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[27]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[28]  Kosuke Imai,et al.  Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments , 2005, American Political Science Review.

[29]  Jason Barabas,et al.  How Deliberation Affects Policy Opinions , 2004, American Political Science Review.

[30]  D. Harding Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on Dropping Out and Teenage Pregnancy1 , 2003, American Journal of Sociology.

[31]  Stefano DellaVigna,et al.  The Fox News Effect: Media Bias and Voting , 2006 .

[32]  Rajeev Dehejia Practical propensity score matching: a reply to Smith and Todd , 2005 .

[33]  Marvin A. Titus Detecting selection bias, using propensity score matching, and estimating treatment effects: an application to the private returns to a master’s degree , 2007 .

[34]  Michael Wooldridge,et al.  Econometric Analysis of Cross Section and Panel Data, 2nd Edition , 2001 .

[35]  O. Baser,et al.  Too much ado about propensity score models? Comparing methods of propensity score matching. , 2006, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[36]  D. Green,et al.  Baseline, Placebo, and Treatment: Efficient Estimation for Three-Group Experiments , 2010, Political Analysis.

[37]  Anthony J. Onwuegbuzie,et al.  Estimating and Using Propensity Score Analysis With Complex Samples , 2006 .

[38]  J. Angrist,et al.  Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs : The Case of Manpower Training , 2007 .

[39]  M. Hutchison,et al.  Federal Reserve Bank of San Francisco Currency Crises, Capital Account Liberalization, and Selection Bias Currency Crises, Capital Account Liberalization, and Selection Bias , 2022 .

[40]  S. Morgan,et al.  Matching Estimators of Causal Effects , 2006 .

[41]  E. Plutzer Becoming a Habitual Voter: Inertia, Resources, and Growth in Young Adulthood , 2002, American Political Science Review.

[42]  M. Ensminger,et al.  Adult social behavioral effects of heavy adolescent marijuana use among African Americans. , 2006, Developmental psychology.

[43]  Donald B. Rubin,et al.  Affinely invariant matching methods with discriminant mixtures of proportional ellipsoidally symmetric distributions , 2006, math/0611263.

[44]  L. Delbeke Quasi-experimentation - design and analysis issues for field settings - cook,td, campbell,dt , 1980 .

[45]  D. Green,et al.  Get Out the Vote!: How to Increase Voter Turnout , 2004 .

[46]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[47]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .