论文信息 - VERGE in VBS 2020

VERGE in VBS 2020

This paper demonstrates VERGE, an interactive video retrieval engine for browsing a collection of images or videos and searching for specific content. The engine integrates a multitude of retrieval methodologies that include visual and textual searches and further capabilities such as fusion and reranking. All search options and results appear in a web application that aims at a friendly user experience.

[1] Yiannis Kompatsiaris,et al. ITI-CERTH participation in TRECVID 2018 , 2017, TRECVID.

[2] Yiannis Kompatsiaris,et al. A hybrid graph-based and non-linear late fusion approach for multimedia retrieval , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[3] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Shin'ichi Satoh,et al. Consensus-based Sequence Training for Video Captioning , 2017, ArXiv.

[6] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Manfred K. Warmuth,et al. THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[8] Xirong Li,et al. Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ioannis Patras,et al. Implicit and Explicit Concept Relations in Deep Neural Networks for Multi-Label Video/Image Annotation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[10] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Kiyoshi Tanaka,et al. Ceci n'est pas une pipe: A deep convolutional network for fine-art paintings classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[12] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[14] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Jonathan G. Fiscus,et al. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[17] Ralph Gasser,et al. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018 , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[18] Dong Liu,et al. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[19] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20] Chu-Song Chen,et al. Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.