论文信息 - UQMG @ TRECVID 2015: Instance Search

UQMG @ TRECVID 2015: Instance Search

The UQMG group submits three runs for instance search at TRECVid 2015 [13]: all of them are automatic runs. Instead of adopting the traditional retrieval approach, e.g., Bag-of-Visual-Word (BoVW), our approach consists of three major steps: video decomposition, feature extraction and indexing. During decomposition, video segmentation is applied and various objects are extracted. Here a visual object is a minimal unit, and a video might consists of thousands of objects. Then we extract the visual feature of the object by using a convolutional neural network (ConvNet), which is a high-dimensional vector outputted by a fully connected layer of the network. Finally, the instance search problem is treated as finding the approximate nearest neighbors (ANN) of a given query in a large set of data points in high-dimensional space. Our best mAP is 0.114.

Heng Tao Shen | Jiewei Cao | Zi Huang

[1] Atsuto Maki,et al. Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[2] Kunio Kashino,et al. BM25 With Exponential IDF for Instance Search , 2014, IEEE Transactions on Multimedia.

[3] Thomas Brox,et al. A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[4] Thomas Brox,et al. Spectral Graph Reduction for Efficient Image and Streaming Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Bernt Schiele,et al. Classifier based graph construction for video segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Mei Han,et al. Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8] Shin'ichi Satoh,et al. Large vocabulary quantization for searching instances from videos , 2012, ICMR '12.

[9] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11] Kai Li,et al. Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[12] Jie Lin,et al. DeepHash: Getting Regularization, Depth and Fine-Tuning Right , 2015, ArXiv.

[13] David Stutz,et al. Neural Codes for Image Retrieval , 2015 .

[14] Shin'ichi Satoh,et al. Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[15] Shuicheng Yan,et al. SOLD: Sub-optimal low-rank decomposition for efficient video segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.