Seq-NMS for Video Object Detection

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modification of the post-processing phase that uses high-scoring object detections from nearby frames to boost scores of weaker detections within the same clip. We show that our method obtains superior results to state-of-the-art single image object detection techniques. Our method placed 3rd in the video object detection (VID) task of the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015).

[1]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[2]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Pascal Fua,et al.  Robust People Tracking with Global Trajectory Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  A. G. Amitha Perera,et al.  Multi-Object Tracking Through Simultaneous Long Occlusions and Split-Merge Conditions , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[8]  Andreas Geiger,et al.  Understanding High-Level Semantics by Modeling Traffic Patterns , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Xiaogang Wang,et al.  T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[12]  Larry S. Davis,et al.  Fast multiple object tracking via a hierarchical particle filter , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15]  James J. Little,et al.  A Linear Programming Approach for Multiple Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[18]  Jack K. Wolf,et al.  Finding the best set of K paths through a trellis with application to multitarget tracking , 1989 .

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.