Learning Deep Appearance Feature for Multi-target Tracking

Multi-target tracking is a worthy studying issue in computer vision. For surveillance video, frequent occlusion and dense crowds complicate the issue. To resolve these difficulties, this paper proposes an effective algorithm of multi-target tracking in videos. Firstly, the faster Rcnn is proposed with the residual network to extract the objects of pedestrians in surveillance videos. The proposedment can effectively eliminate invalid target detection frames, separate peer targets and resist partial occlusions. Then, this paper put forward an accurate and efficient appearance-feature matching network model that is inspired by pedestrian re-identification theory. The deep learning feature-extraction module is composed of the stem Cnn and the Resnet blocks, therefore it can load res-50 caffemodel as pretraining model to increase the accuracy of the featureextraction. Meanwhile, the proposed network can decrease the time of train and test comparing with Resnet. Finally, the obtained multiple target tracking trajectories are further optimized by the strategy of occlusion distinction, deduplication and merging. The experiment results of the 2D MOT 2015 benchmark, KITTI dataset indicate that this proposed algorithm outperforms alternative multiple objects trackers in terms of multiple indicators.

[1]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Kuk-Jin Yoon,et al.  Improving Person Re-identification via Pose-Aware Multi-shot Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Thomas Brox,et al.  A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects , 2016, ArXiv.

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Santiago Manen,et al.  PathTrack: Fast Trajectory Annotation with Path Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[12]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Ian D. Reid,et al.  Joint Probabilistic Data Association Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Charless C. Fowlkes,et al.  Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions , 2016, International Journal of Computer Vision.

[16]  Fabio Poiesi,et al.  Online Multi-target Tracking with Strong and Weak Detections , 2016, ECCV Workshops.

[17]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[18]  Bing Wang,et al.  Tracklet Association by Online Target-Specific Metric Learning and Coherent Dynamics Estimation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Seung-Hwan Bae,et al.  Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Lu Zhang,et al.  Structure Preserving Object Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[25]  Enkhbayar Erdenee,et al.  Multi-class Multi-object Tracking Using Changing Point Detection , 2016, ECCV Workshops.