CATS: Co-saliency Activated Tracklet Selection for Video Co-localization

Video co-localization is the task of jointly localizing common objects across videos. Due to the appearance variations both across the videos and within the video, it is a challenging problem to identify and track them without any supervision. In contrast to previous joint frameworks that use bounding box proposals to attack the problem, we propose to leverage co-saliency activated tracklets to address the challenge. To identify the common visual object, we first explore inter-video commonness, intra-video commonness, and motion saliency to generate the co-saliency maps. Object proposals of high objectness and co-saliency scores are tracked across short video intervals to build tracklets. The best tube for a video is obtained through tracklet selection from these intervals based on confidence and smoothness between the adjacent tracklets, with the help of dynamic programming. Experimental results on the benchmark YouTube Object dataset show that the proposed method outperforms state-of-the-art methods.

[1]  Chao Li,et al.  Co-saliency detection via looking deep and wide , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  King Ngi Ngan,et al.  A Co-Saliency Model of Image Pairs , 2011, IEEE Transactions on Image Processing.

[3]  Haibin Ling,et al.  Robust Visual Tracking using 1 Minimization , 2009 .

[4]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[6]  Eli Shechtman,et al.  Cosaliency: where people look when comparing images , 2010, UIST.

[7]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[8]  Jianfei Cai,et al.  Automatic image co-segmentation using geometric mean saliency , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[9]  Takahiro Ishikawa,et al.  The template update problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hwann-Tzong Chen,et al.  Preattentive co-saliency detection , 2010, 2010 IEEE International Conference on Image Processing.

[11]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[12]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[13]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Nanning Zheng,et al.  Video Object Discovery and Co-Segmentation with Extremely Weak Supervision , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Haibin Ling,et al.  Robust visual tracking using ℓ1 minimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Jean Ponce,et al.  Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Nassir Navab,et al.  Rapid selection of reliable templates for visual tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Fei-Fei Li,et al.  Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[21]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[22]  Wenbin Zou,et al.  Co-Saliency Detection Based on Hierarchical Segmentation , 2014, IEEE Signal Processing Letters.

[23]  Xiaochun Cao,et al.  Cluster-Based Co-Saliency Detection , 2013, IEEE Transactions on Image Processing.

[24]  Shang-Hong Lai,et al.  From co-saliency to co-segmentation: An efficient and fully unsupervised energy minimization model , 2011, CVPR 2011.

[25]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[26]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Li Bai,et al.  Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion Detection , 2013, IEEE Transactions on Image Processing.

[29]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Jianfei Cai,et al.  Group saliency propagation for large scale and quick image co-segmentation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).