CAN: Composite Appearance Network and a Novel Evaluation Metric for Person Tracking

Tracking multiple people across multiple cameras is an open problem. It is typically divided into two tasks: (i) single-camera tracking (SCT) - identify trajectories in the same scene, and (ii) inter-camera tracking (ICT) - identify trajectories across cameras for real surveillance scenes. Many methods cater to SCT, while ICT still remains a challenge. In this paper, we present a feature aggregation architecture called Composite Appearance Network (CAN) to address the above problem. The key structure of this architecture is called EvalNet that pays attention to each feature vector and learns to weight them based on gradients it receives for the overall template for optimal re-identification performance. We demonstrate the efficiency of our approach with experiments on the challenging multi-camera tracking dataset, DukeMTMC. We also survey existing tracking measures and present an online error metric called "Inference Error" (IE) that provides a better estimate of tracking/re-identification error, by treating SCT and ICT errors uniformly.

[1]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Niki Martinel,et al.  Re-identify people in wide area camera network , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Bingbing Ni,et al.  Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[4]  Kaiqi Huang,et al.  An Equalized Global Graph Model-Based Approach for Multicamera Object Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Venu Govindaraju,et al.  Metadata-Based Feature Aggregation Network for Face Recognition , 2018, 2018 International Conference on Biometrics (ICB).

[6]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[8]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Afshin Dehghan,et al.  GMMCP tracker: Globally optimal Generalized Maximum Multi Clique problem for multiple object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Amit K. Roy-Chowdhury,et al.  Tracking multiple interacting targets in a camera network , 2015, Comput. Vis. Image Underst..

[11]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Fabien Moutarde,et al.  Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[13]  Gérard G. Medioni,et al.  Exploring context information for inter-camera multiple target tracking , 2014, IEEE Winter Conference on Applications of Computer Vision.

[14]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Carlo Tomasi,et al.  Tracking Multiple People Online and in Real Time , 2014, ACCV.

[17]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Yu Cheng,et al.  Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Bingpeng Ma,et al.  Local Descriptors Encoded by Fisher Vectors for Person Re-identification , 2012, ECCV Workshops.

[21]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[22]  Venu Govindaraju,et al.  Re-identification for Online Person Tracking by Modeling Space-Time Continuum , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[24]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[25]  Konrad Schindler,et al.  Challenges of Ground Truth Evaluation of Multi-target Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Ehud Rivlin,et al.  Color Invariants for Person Reidentification , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Juergen Gall,et al.  PoseTrack: Joint Multi-person Pose Estimation and Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Gilbert,et al.  Tracking Objects Across Cameras by Incrementally Learning Inter-camera Colour Calibration and Patterns of Activity , 2006, ECCV.

[30]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Shaogang Gong,et al.  Learning a Discriminative Null Space for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[33]  Arun Ross,et al.  Modelling errors in a biometric re-identification system , 2015, IET Biom..

[34]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[35]  Alberto Del Bimbo,et al.  Memory Based Online Learning of Deep Representations from Video Streams , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Venu Govindaraju,et al.  Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Mario Sznaier,et al.  Multi-camera Multi-Object Tracking , 2017, ArXiv.

[38]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Mario Sznaier,et al.  The Way They Move: Tracking Multiple Targets with Similar Appearance , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Sander Dieleman,et al.  Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video , 2015, International Journal of Computer Vision.

[41]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.