Is My Object in This Video? Reconstruction-based Object Search in Videos

This paper addresses the problem of video-level object instance search, which aims to retrieve the videos in the database that contain a given query object instance. Without prior knowledge about “when” and “where” an object of interest may appear in a video, determining “whether” a video contains the target object is computationally prohibitive, as it requires exhaustively matching the query against all possible spatial-temporal locations in each video that an object may appear. To alleviate the computational and memory cost, we propose the Reconstruction-based Object SEarch (ROSE) method. It characterizes a huge corpus of features of possible spatial-temporal locations in the video into the parameters of the reconstruction model. Since the memory cost of storing reconstruction model is much less than that of storing features of possible spatial-temporal locations in the video, the efficiency of the search is significantly boosted. Comprehensive experiments on three benchmark datasets demonstrate the promising performance of the proposed ROSE method.

[1]  Junsong Yuan,et al.  From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  E.E. Pissaloux,et al.  Image Processing , 1994, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing.

[3]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[4]  Remco C. Veltkamp,et al.  A combined post-filtering method to improve accuracy of variational optical flow estimation , 2014, Pattern Recognit..

[5]  Minyi Guo,et al.  Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[8]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[9]  Junsong Yuan,et al.  HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Brendan J. Frey,et al.  k-Sparse Autoencoders , 2013, ICLR.

[11]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[12]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Shih-Fu Chang,et al.  Attributes and categories for generic instance search from one example , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Gang Wang,et al.  Object Instance Search in Videos via Spatio-Temporal Trajectory Discovery , 2016, IEEE Transactions on Multimedia.

[17]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[18]  Ling-Yu Duan,et al.  Query-Adaptive Small Object Search Using Object Proposals and Shape-Aware Descriptors , 2016, IEEE Transactions on Multimedia.

[19]  Gang Hua,et al.  Learning Discriminative Reconstructions for Unsupervised Outlier Removal , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Shin'ichi Satoh,et al.  Efficient instance search from large video database via sparse filters in subspaces , 2013, 2013 IEEE International Conference on Image Processing.

[21]  Shin'ichi Satoh,et al.  Large vocabulary quantization for searching instances from videos , 2012, ICMR '12.

[22]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[23]  Mohammed Bennamoun,et al.  Deep Reconstruction Models for Image Set Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Joo-Hwee Lim,et al.  Efficient Retrieval from Large-Scale Egocentric Visual Data Using a Sparse Graph Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Junsong Yuan,et al.  Efficient Object Instance Search Using Fuzzy Objects Matching , 2017, AAAI.