Learning the Difference that Makes a Difference with Counterfactually-Augmented Data