Multimodal feature-wise co-attention method for visual question answering