VERGE in VBS 2020

This paper demonstrates VERGE, an interactive video retrieval engine for browsing a collection of images or videos and searching for specific content. The engine integrates a multitude of retrieval methodologies that include visual and textual searches and further capabilities such as fusion and reranking. All search options and results appear in a web application that aims at a friendly user experience.

[1]  Yiannis Kompatsiaris,et al.  ITI-CERTH participation in TRECVID 2018 , 2017, TRECVID.

[2]  Yiannis Kompatsiaris,et al.  A hybrid graph-based and non-linear late fusion approach for multimedia retrieval , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[3]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Shin'ichi Satoh,et al.  Consensus-based Sequence Training for Video Captioning , 2017, ArXiv.

[6]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[8]  Xirong Li,et al.  Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ioannis Patras,et al.  Implicit and Explicit Concept Relations in Deep Neural Networks for Multi-Label Video/Image Annotation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Tao Mei,et al.  MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kiyoshi Tanaka,et al.  Ceci n'est pas une pipe: A deep convolutional network for fine-art paintings classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  David J. Fleet,et al.  VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[14]  Yale Song,et al.  TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jonathan G. Fiscus,et al.  TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[17]  Ralph Gasser,et al.  Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018 , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[18]  Dong Liu,et al.  EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.