On Improving Bounding Box Regression Towards Accurate Object Detection and Tracking

In this paper, an improved bounding box regression technique is studied for CNN-based object detection and tracking. The proposed approach uses a CNN-based network to identify and localise objects which are boxed by a region of interest regressor. In many cases, a generated bounding box does not fit tightly to the detected object which leads to inaccurate results for spatial analysis such as geometric warping. In this paper, a technique is designed to improve this situation by means of background modelling, motion analysis, template correlation and Kalman filters. The experimental results suggest that the proposed method outperforms single frame object detector-based tracking techniques with an improvement of over 80% in the context of surveillance camera-based pedestrian analysis.

[1]  Hanseok Ko,et al.  Precise Regression for Bounding Box Correction for Improved Tracking Based on Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Jianhua Lu,et al.  Boundary Objectness Network for Object Detection and Localization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Ivan V. Bajic,et al.  MV-YOLO: Motion Vector-Aided Tracking by Semantic Object Detection , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[11]  David G. Kirkpatrick,et al.  On the shape of a set of points in the plane , 1983, IEEE Trans. Inf. Theory.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[16]  Reinhard Klette,et al.  Traffic intersection monitoring using fusion of GMM-based deep learning classification and geometric warping , 2017, 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[17]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..