Deep Siamese Network for Multiple Object Tracking

Multiple object tracking is an important but challenging computer vision task. Thanks to the significant progress in object detection field, tracking-by-detection becomes a trending paradigm for tracking multiple objects at the same time. Appearance models are also widely used for associating detection results. In this paper, we combine cosine similarity metric learning with very deep convolutional neural network, yielding a robust appearance pairwise matching model: a deep Siamese network capable of re-identifying the same object after a long time and dealing with partial and complete occlusion. Embedded in existing tracking algorithms, our model is a lightweight but powerful module for decision-making among track hypotheses. Experiments on MOT Challenge 2016 benchmark [1] demonstrate the effectiveness of our model, which achieves state-of-the-art performance without delving into extensive hyper-parameter tuning.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[3]  Afshin Dehghan,et al.  Target Identity-aware Network Flow for online multiple target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[12]  Robert T. Collins,et al.  Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Atilla Baskurt,et al.  Triangular similarity metric learning for face verification , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).