Video Object Detection Guided by Object Blur Evaluation

In recent years, the excellent image-based object detection algorithms are transferred to the video object detection directly. These frame-by-frame processing methods are suboptimal owing to the degenerate object appearance such as motion blur, defocus and rare poses. The existing works for video object detection mostly focus on the feature aggregation at pixel level and instance level, but the blur impact in the aggregation process has not been exploited well so far. In this article, we propose an end-to-end blur-aid feature aggregation network (BFAN) for video object detection. The proposed BFAN focuses on the aggregation process influenced by the blur including motion blur and defocus with high accuracy and little increased computation. In BFAN, we evaluate the object blur degree of each frame as the weight for aggregation. Noteworthy, the background is usually flat which has a negative impact on the object blur degree evaluation. Therefore, we introduce a light saliency detection network to alleviate the background interference. The experiments conducted on the ImageNet VID dataset show that BFAN achieves the state-of-the-art detection performance, exactly 79.1% mAP, with 3 points improvement compared to the video object detection baseline.

[1]  Chi-Man Vong,et al.  SCNet: Scale-aware coupling-structure network for efficient video object detection , 2020, Neurocomputing.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jocelyn Chanussot,et al.  Fourier-Based Rotation-Invariant Feature Boosting: An Efficient Framework for Geospatial Object Detection , 2019, IEEE Geoscience and Remote Sensing Letters.

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yujie Wang,et al.  Flow-Guided Feature Aggregation for Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Guodong Wang,et al.  A Novel Video Salient Object Detection Method via Semisupervised Motion Quality Perception , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Shuai Li,et al.  Accurate and Robust Video Saliency Detection via Self-Paced Diffusion , 2020, IEEE Transactions on Multimedia.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Yong Jae Lee,et al.  Video Object Detection with an Aligned Spatial-Temporal Memory , 2017, ECCV.

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Jocelyn Chanussot,et al.  ORSIm Detector: A Novel Object Detection Framework in Optical Remote Sensing Imagery Using Spatial-Frequency Channel Features , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Shuicheng Yan,et al.  Highly Efficient Salient Object Detection with 100K Parameters , 2020, ECCV.

[14]  Houqiang Li,et al.  Single Shot Video Object Detector , 2020, IEEE Transactions on Multimedia.

[15]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Li Xu,et al.  Discriminative Blur Detection Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Zhou Wang,et al.  Deep Blur Mapping: Exploiting High-Level Semantics by Deep Neural Networks , 2016, IEEE Transactions on Image Processing.

[18]  Dongfang Liu,et al.  Video object detection for autonomous driving: Motion-aid feature calibration , 2020, Neurocomputing.

[19]  Caihong Mu,et al.  Hyperspectral Image Classification Based on Active Learning and Spectral-Spatial Feature Fusion Using Spatial Coordinates , 2020, IEEE Access.

[20]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Hong Qin,et al.  Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion , 2017, IEEE Transactions on Image Processing.

[22]  Zhidong Deng,et al.  Fully Motion-Aware Network for Video Object Detection , 2018, ECCV.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Kang Ryoung Park,et al.  Thermal Image Reconstruction Using Deep Learning , 2020, IEEE Access.

[25]  Hong Qin,et al.  Bilevel Feature Learning for Video Saliency Detection , 2018, IEEE Transactions on Multimedia.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[29]  Xuelong Li,et al.  Motion Blur Detection With an Indicator Function for Surveillance Machines , 2016, IEEE Transactions on Industrial Electronics.

[30]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jianbo Shi,et al.  Object Detection in Video with Spatiotemporal Sampling Networks , 2018, ECCV.

[32]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[33]  Shuai Li,et al.  A Plug-and-Play Scheme to Adapt Image Saliency Deep Model for Video Data , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Chong Peng,et al.  Improved Robust Video Saliency Detection Based on Long-Term Spatial-Temporal Information , 2020, IEEE Transactions on Image Processing.

[35]  Ran Tao,et al.  Vehicle Detection of Multi-source Remote Sensing Data Using Active Fine-tuning Network , 2020, ISPRS Journal of Photogrammetry and Remote Sensing.

[36]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Xiaogang Wang,et al.  Object Detection from Video Tubelets with Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Shuai Li,et al.  Stage-wise Salient Object Detection in 360° Omnidirectional Image via Object-level Semantical Saliency Ranking , 2020, IEEE Transactions on Visualization and Computer Graphics.

[41]  Xin Yi,et al.  LBP-Based Segmentation of Defocus Blur , 2016, IEEE Transactions on Image Processing.

[42]  Peng Gao,et al.  Video Object Detection with Locally-Weighted Deformable Neighbors , 2019, AAAI.

[43]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.