Analysing Domain Shift Factors between Videos and Images for Object Detection

Object detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, for a given test domain (image or video), the performance of the detector depends on the domain it was trained on. In this paper, we examine the reasons behind this performance gap. We define and evaluate different domain shift factors: spatial location accuracy, appearance diversity, image quality and aspect distribution. We examine the impact of these factors by comparing performance before and after factoring them out. The results show that all four factors affect the performance of the detectors and their combined effect explains nearly the whole performance gap.

[1]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[3]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[6]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[7]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[11]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[12]  Vittorio Ferrari,et al.  Associative Embeddings for Large-Scale Knowledge Transfer with Self-Assessment , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[14]  Limin Wang,et al.  Video Action Detection with Relational Dynamic-Poselets , 2014, ECCV.

[15]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Tao Xiang,et al.  In Defence of Negative Mining for Annotating Weakly Labelled Data , 2012, ECCV.

[20]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[21]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22]  Fei-Fei Li,et al.  Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[27]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[28]  Pramod Sharma,et al.  Efficient Detector Adaptation for Object Detection in a Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[30]  Rama Chellappa,et al.  Unsupervised Adaptation Across Domain Shifts by Generating Intermediate Data Representations , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Kate Saenko,et al.  Asymmetric and Category Invariant Feature Transformations for Domain Adaptation , 2014, International Journal of Computer Vision.

[32]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[33]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[34]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[38]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Tao Xiang,et al.  Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tao Xiang,et al.  Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[43]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .