论文信息 - INSIGHT@DCU TRECVID 2019: Video to Text

INSIGHT@DCU TRECVID 2019: Video to Text

In this paper we describe the approach we developed for the TRECVID video to text task, specifically the free-text generation sub-task. This sub-task consists of generating a textual description using only the information that can be extracted from the videos. We tackle the problem using a commonly used BLSTM network with an alternate enhance mechanism. To improve the model we study the effect of using different datasets and features. One of the main problems of the video captioning challenge is the size of the vocabulary, which adds another level of complexity, as the model needs to produce a rich vocabulary without previous knowledge of the scene. Therefore, we also discuss the use of an image captioning module to guide the initial text obtained from the video.

Noel E. O'Connor | Kevin McGuinness | Luis Lebron

[1] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2] Naokazu Yokoya,et al. Learning Joint Representations of Videos and Sentences with Web Image Search , 2016, ECCV Workshops.

[3] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.

[4] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[6] Petia Radeva,et al. Video Description Using Bidirectional Recurrent Neural Networks , 2016, ICANN.

[7] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Lorenzo Torresani,et al. C3D: Generic Features for Video Analysis , 2014, ArXiv.

[9] Jonathan G. Fiscus,et al. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval , 2019, TRECVID.

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.