Transformer-based Sparse Encoder and Answer Decoder for Visual Question Answering