Tracking based sparse box proposal for time constraint detection in video stream

Search and Rescue or surveillance applications from en embedded moving camera yield challenging computer vision problems as both very high precision/recall and real-time performance are required. However, in these contexts, it is often sufficient to assess the detection of each object of interest only once in the area spanned by the camera, with the idea that what matters is to be aware of its existence, its time to detection being a secondary objective. Taking advantage of this point, we describe a sparse box proposal controlled by tracking and designed to generate a small number of boxes per frame covering each object at least once in the video. Our sparse box proposal adapts to the budget allowed for single box classification and is able to achieve relevant results even on challenging situations while comfortably dealing with real time requirements.

[1]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[2]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[3]  Romaric Audigier,et al.  IMM-Based Tracking and Latency Control with Off-the-Shelf IP PTZ Camera , 2013, ACIVS.

[4]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[5]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[6]  Mennatullah Siam,et al.  Robust autonomous visual detection and tracking of moving targets in UAV imagery , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Cordelia Schmid,et al.  Spatio-temporal Object Detection Proposals , 2014, ECCV.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[13]  Tinne Tuytelaars,et al.  Video object proposals , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Ting Yu,et al.  Collaborative Real-Time Control of Active Cameras in Large Scale Surveillance Systems , 2008 .

[15]  Kuk-Jin Yoon,et al.  Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Bernhard Rinner,et al.  Trajectory clustering for motion pattern extraction in aerial videos , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  David A. Forsyth,et al.  30Hz Object Detection with DPM V5 , 2014, ECCV.