Deep Learning-Based Video Retrieval Using Object Relationships and Associated Audio Classes

This paper introduces a video retrieval tool for the 2020 Video Browser Showdown (VBS). The tool enhances the user’s video browsing experience by ensuring full use of video analysis database constructed prior to the Showdown. Deep learning based object detection, scene text detection, scene color detection, audio classification and relation detection with scene graph generation methods have been used to construct the data. The data is composed of visual, textual, and auditory information, broadening the scope to which a user can search beyond visual information. In addition, the tool provides a simple and user-friendly interface for novice users to adapt to the tool in little time.

[1]  Jonathan G. Fiscus,et al.  TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[2]  Klaus Schöffmann,et al.  A User-Centric Media Retrieval Competition: The Video Browser Showdown 2012-2014 , 2014, IEEE Multim..

[3]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ralph Gasser,et al.  Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018 , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[5]  George Awad,et al.  V3C1 Dataset: An Evaluation of Content Characteristics , 2019, ICMR.

[6]  Yong Xu,et al.  Audio Set Classification with Attention Model: A Probabilistic Perspective , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  George Awad,et al.  V3C - a Research Video Collection , 2018, MMM.

[8]  Ji Zhang,et al.  Graphical Contrastive Losses for Scene Graph Parsing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Vipin Tyagi,et al.  An efficient technique for retrieval of color images in large databases , 2015, Comput. Electr. Eng..

[13]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[14]  George Awad,et al.  On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.