论文信息 - Spatiotemporal-Textual Co-Attention Network for Video Question Answering

Spatiotemporal-Textual Co-Attention Network for Video Question Answering

Visual Question Answering (VQA) is to provide a natural language answer for a pair of an image or video and a natural language question. Despite recent progress on VQA, existing works primarily foc...

ZhangYongdong | ZhaZheng-Jun | LiuJiawei | YangTianhao