FDU Participation in TRECVID 2019 VTT Task

This notebook paper presents the system design of the FDU team in the TRECVID 2019 [1] VTT task. Our approach adopts temporal concept prediction as an auxiliary task to assist caption generation. The concept prediction module generates a context sequence that contains latent semantic features, which are later fused into the captioning module. We demonstrate the effectiveness of our designed auxiliary task as well as the whole captioning system.