Improved Visual Grounding through Self-Consistent Explanations