Self-Learning Camera: Autonomous Adaptation of Object Detectors to Unlabeled Video Streams

Learning object detectors requires massive amounts of labeled training samples from the specific data source of interest. This is impractical when dealing with many different sources (e.g., in camera networks), or constantly changing ones such as mobile cameras (e.g., in robotics or driving assistant systems). In this paper, we address the problem of self-learning detectors in an autonomous manner, i.e. (i) detectors continuously updating themselves to efficiently adapt to streaming data sources (contrary to transductive algorithms), (ii) without any labeled data strongly related to the target data stream (contrary to self-paced learning), and (iii) without manual intervention to set and update hyper-parameters. To that end, we propose an unsupervised, on-line, and self-tuning learning algorithm to optimize a multi-task learning convex objective. Our method uses confident but laconic oracles (high-precision but low-recall off-the-shelf generic detectors), and exploits the structure of the problem to jointly learn on-line an ensemble of instance-level trackers, from which we derive an adapted category-level object detector. Our approach is validated on real-world publicly available video object datasets.

[1]  Cordelia Schmid,et al.  Segmentation Driven Object Detection with Fisher Vectors , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pramod Sharma,et al.  Unsupervised incremental learning for improved object detection in a video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[5]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[7]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[14]  Gang Hua,et al.  Detection by detections: Non-parametric detector adaptation for a video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[16]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[18]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ben Taskar,et al.  Learning on the Test Data: Leveraging Unseen Features , 2003, ICML.

[20]  Kristen Grauman,et al.  Reshaping Visual Datasets for Domain Adaptation , 2013, NIPS.

[21]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Efficient Action Localization with Approximately Normalized Fisher Vectors , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.