Spatiotemporal-Textual Co-Attention Network for Video Question Answering
暂无分享,去创建一个
Visual Question Answering (VQA) is to provide a natural language answer for a pair of an image or video and a natural language question. Despite recent progress on VQA, existing works primarily foc...