Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception