Enhanced Retrieval and Browsing in the IMOTION System

This paper presents the IMOTION system in its third version. While still focusing on sketch-based retrieval, we improved upon the semantic retrieval capabilities introduced in the previous version by adding more detectors and improving the interface for semantic query specification. In addition to previous year’s system, we increase the role of features obtained from Deep Neural Networks in three areas: semantic class labels for more entry-level concepts, hidden layer activation vectors for query-by-example and 2D semantic similarity results display. The new graph-based result navigation interface further enriches the system’s browsing capabilities. The updated database storage system \(\textsf {ADAM}_{{pro }}\) designed from the ground up for large scale multimedia applications ensures the scalability to steadily growing collections.

[1]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[2]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[3]  Heiko Schuldt,et al.  IMOTION - A Content-Based Video Retrieval Engine , 2015, MMM.

[4]  Heiko Schuldt,et al.  Cineast: A Multi-feature Sketch-Based Video Retrieval Engine , 2014, 2014 IEEE International Symposium on Multimedia.

[5]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[7]  Kai Uwe Barthel,et al.  Graph-Based Browsing for Large Video Collections , 2015, MMM.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Heiko Schuldt,et al.  vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections , 2016, ACM Multimedia.

[10]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[11]  Heiko Schuldt,et al.  ADAMpro: Database Support for Big Multimedia Retrieval , 2016, Datenbank-Spektrum.

[12]  Pietro Perona,et al.  Describing Common Human Visual Actions in Images , 2015, BMVC.

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  Luca Rossetto,et al.  Interactive video search tools: a detailed analysis of the video browser showdown 2015 , 2016, Multimedia Tools and Applications.

[15]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Heiko Schuldt,et al.  IMOTION - Searching for Video Sequences Using Multi-Shot Sketch Queries , 2016, MMM.

[17]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[18]  Laurent Amsaleg,et al.  A large-scale performance study of cluster-based high-dimensional indexing , 2010, VLS-MCMR '10.