Learning Modality-Invariant Features by Cross-Modality Adversarial Network for Visual Question Answering