Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection

Though quite challenging, leveraging large-scale unlabeled or partially labeled images in a cost-effective way has increasingly attracted interests for its great importance to computer vision. To tackle this problem, many Active Learning (AL) methods have been developed. However, these methods mainly define their sample selection criteria within a single image context, leading to the suboptimal robustness and impractical solution for large-scale object detection. In this paper, aiming to remedy the drawbacks of existing AL methods, we present a principled Self-supervised Sample Mining (SSM) process accounting for the real challenges in object detection. Specifically, our SSM process concentrates on automatically discovering and pseudo-labeling reliable region proposals for enhancing the object detector via the introduced cross image validation, i.e., pasting these proposals into different labeled images to comprehensively measure their values under different image contexts. By resorting to the SSM process, we propose a new AL framework for gradually incorporating unlabeled or partially labeled data into the model learning while minimizing the annotating effort of users. Extensive experiments on two public benchmarks clearly demonstrate our proposed framework can achieve the comparable performance to the state-of-the-art methods with significantly fewer annotations.

[1]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[2]  Deyu Meng,et al.  Few-Example Object Detection with Model Communication , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Dim P. Papadopoulos,et al.  How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[5]  Kristen Grauman,et al.  Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[8]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[9]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[10]  Phill-Kyu Rhee,et al.  Active and semi-supervised learning for object detection with imperfect data , 2017, Cognitive Systems Research.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Changshui Zhang,et al.  Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm , 2017, ArXiv.

[14]  Lei Zhang,et al.  Active Self-Paced Learning for Cost-Effective and Progressive Face Identification , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[17]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[19]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[20]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[21]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[22]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[23]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[24]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Deyu Meng,et al.  Interactive Surveillance Event Detection through Mid-level Discriminative Representation , 2014, ICMR.

[26]  Qi Xie,et al.  Self-Paced Learning for Matrix Factorization , 2015, AAAI.

[27]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[28]  Buyu Liu,et al.  Active Learning for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[31]  Allen Y. Yang,et al.  A Convex Optimization Framework for Active Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sankar K. Pal,et al.  Self-organization for object extraction using a multilayer neural network and fuzziness measures , 1993, IEEE Trans. Fuzzy Syst..

[37]  Nicu Sebe,et al.  Self Paced Deep Learning for Weakly Supervised Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[39]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[40]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[41]  D. Majumder,et al.  Computer Recognition of Vowel Sounds Using a Self-supervised Learning Algorithm , 2014 .

[42]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[43]  Kristen Grauman,et al.  Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds , 2011, CVPR 2011.