A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands

It is standard practice in applied work to study the effect of a binary variable ("treatment") on an outcome of interest using linear models with additive effects. In this paper I study the interpretation of the ordinary and two-stage least squares estimands in such models when treatment effects are in fact heterogeneous. I show that in both cases the coefficient on treatment is identical to a convex combination of two other parameters (different for OLS and 2SLS), which can be interpreted as the average treatment effects on the treated and controls under additional assumptions. Importantly, the OLS and 2SLS weights on these parameters are inversely related to the proportion of each group. The more units get treatment, the less weight is placed on the effect on the treated. What follows, the reliance on these implicit weights can have serious consequences for applied work. I illustrate some of these issues in four empirical applications from different fields of economics. I also develop a weighted least squares correction and simple diagnostic tools that applied researchers can use to avoid potential biases. In an important special case, my diagnostics only require the knowledge of the proportion of treated units.