Propensity scores.

This has been an interesting exchange of views. I would like to restate that I found the original paper that Dr Rubin [1] wrote to be extremely informative, and I would like to apologize for posing a question that he found too confusing to answer. And I would like to thank Drs Pearl [2] and Sjolander [3] for providing both the simple answer (i.e. yes, the propensity score method would introduce bias if used in M-bias situation), and the more detailed background on why this occurs (i.e. because the underlying assumptions are clearly violated). Although Dr Rubin previously responded that he does not find DAGs useful [4], it is clear that many investigators would not have realized that the underlying assumptions of the propensity score method are violated in this situation. To be clearer this time, M-bias is induced when one adjusts for an event (e.g. previous episode) that is caused by two independent unobserved factors: one that encourages treatment and one that influences disease. Because ‘previous episode’ is associated in such cases with both treatment and disease, the natural tendency is to assume that treatment assignment is non-ignorable when not conditioning on previous episode, and ignorable when conditioning on previous episode (which would lead one to include ‘previous episode’ in the propensity score analysis). DAGs make it easy to see that the inclusion of such covariates would produce bias and, in general, help one to distinguish appropriate from bias-producing covariates. The danger is not academic and is most frequent when one is tempted to include covariates that are proxies of unmeasured confounders. If propensity scores are to be used appropriately, investigators must understand and make explicit the reasons why they believe the underlying assumption of strong ignorability is valid for the chosen set of covariates. Failure to do so is the equivalent of a physician explaining all the benefits of a treatment and omitting all the side effects and associated risks—a practice that is considered unethical because of its lack of transparency. For propensity scores to be used widely and reliably, there need to be transparent methods (understandable by the