Visual Question Answering Combining Multi-modal Feature Fusion and Multi-Attention Mechanism