Inference Adaptive Thresholding based Non-Maximum Suppression for Object Detection in Video Image Sequence

This study proposes a novel inference adaptive thresholding based non-maximum suppression (NMS) (IAT-NMS) algorithm for deriving temporal cues between video sequences. The inference of temporal connectivity is first derived according to an overlapping measure of the bounding boxes between adjacent frames. Frames with high-confidence detection object are taken as key frames to leverage the scores of neighbor detections and preserve potential detections of blurred objects with low scores. Then, bounding boxes within each frame are ranked via their confidence scores and the overlapping ratio between the bounding box with the highest score against the remaining surrounding boxes is computed. This measure of overlapping is brought into a Gaussian function to estimate weights for adaptive suppression and to softly suppress the detection scores of possible severely overlapped objects. The proposed method is compared with state-of-the-art video object detection techniques. With the application of IAT-NMS, overlapping objects originally undistinguishable in the compared methods become detectable. Experimental results demonstrate that this simple and unsupervised method outperforms state-of-the-art NMS algorithms, with an increase of 6% in mean average precision (mAP) on the ImageNet VID dataset. Our study on performance limitations and sensitivity to parametric variations also finds that IAT-NMS demonstrates better detection capability than does the three compared algorithms, which fail to detect all targets or distinguish in the presence of multiple overlapping targets.

[1]  Bernt Schiele,et al.  Learning Non-maximum Suppression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Xiaogang Wang,et al.  Object Detection in Videos with Tubelet Proposal Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jyoti J. Jadhav,et al.  Moving Object Detection for Video Surveillance System , 2015 .

[6]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jian Sun,et al.  Object Detection Networks on Convolutional Feature Maps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  립톤알랜제이.,et al.  Video surveillance system employing video primitives , 2002 .

[13]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bernt Schiele,et al.  A Convnet for Non-maximum Suppression , 2015, GCPR.

[15]  Shuicheng Yan,et al.  Seq-NMS for Video Object Detection , 2016, ArXiv.

[16]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Wenlong Liu,et al.  Yes-Net: An effective Detector Based on Global Information , 2017, ArXiv.