IVIST: Interactive VIdeo Search Tool in VBS 2020

This paper presents a new video retrieval tool, Interactive VIdeo Search Tool (IVIST), which participates in the 2020 Video Browser Showdown (VBS). As a video retrieval tool, IVIST is equipped with proper and high-performing functionalities such as object detection, dominant-color finding, scene-text recognition and text-image retrieval. These functionalities are constructed with various deep neural networks. By adopting these functionalities, IVIST performs well in searching users’ desirable videos. Furthermore, due to user-friendly user interface, IVIST is easy to use even for novice users. Although IVIST is developed to participate in VBS, we hope that it will be applied as a practical video retrieval tool in the future, dealing with actual video data on the Internet.

[1]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[4]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Loutfi Nuaymi,et al.  HTTP/2-based Frame Discarding for Low-Latency Adaptive Video Streaming , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[9]  Ralph Gasser,et al.  Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018 , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[10]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[11]  Luca Rossetto,et al.  Interactive video search tools: a detailed analysis of the video browser showdown 2015 , 2016, Multimedia Tools and Applications.

[12]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.