Trajectory Ensemble: Multiple Persons Consensus Tracking Across Non-overlapping Multiple Cameras over Randomly Dropped Camera Networks

Multiple person tracking over a camera network is usually performed by matching person images between adjacent cameras. It easily fails by a temporal appearance change of the persons caused by environmental illumination and observation orientation of a camera. To solve this problem, matching person images across not only adjacent cameras but also cameras multiple hops away in the camera network is effective, however, such relaxation of spatio-temporal cues also cause tracking failure due to the increase of matching candidates. To avoid the failure, we introduce "Random Camera Drop" to generate different camera networks which relax the spatio-temporal cues partially and randomly. And we integrate tracking results over the networks to a consensus tracking result by a novel concept "Trajectory Ensemble", an extension of unsupervised ensemble learning for the multiple person tracking over a camera network problem. We evaluated the framework on several virtual datasets generated from a public dataset, "Shinpuhkan 2014 dataset" and confirmed that the proposed method achieve the highest tracking results among some comparative methods.

[1]  Masayuki Mukunoki,et al.  Tracking Pedestrians Across Multiple Cameras via Partial Relaxation of Spatio-Temporal Constraint and Utilization of Route Cue , 2014, ACCV Workshops.

[2]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Tim J. Ellis,et al.  Bridging the gaps between cameras , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Matej Kristan,et al.  Dana36: A Multi-camera Image Dataset for Object Identification in Surveillance Scenarios , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[5]  Ratnesh Kumar,et al.  Pose tracking by efficiently exploiting global features , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Rita Cucchiara,et al.  3DPeS: 3D people dataset for surveillance and forensics , 2011, J-HGBU '11.

[7]  Amit K. Roy-Chowdhury,et al.  Robust Tracking in A Camera Network: A Multi-Objective Optimization Framework , 2008, IEEE Journal of Selected Topics in Signal Processing.

[8]  M. Cugmas,et al.  On comparing partitions , 2015 .

[9]  Sridha Sridharan,et al.  A Database for Person Re-Identification in Multi-Camera Surveillance Networks , 2012, 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA).

[10]  Silvio Savarese,et al.  Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Slawomir Bak,et al.  Boosted human re-identification using Riemannian manifolds , 2012, Image Vis. Comput..

[13]  Xiaogang Wang,et al.  Learning Mid-level Filters for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Mubarak Shah,et al.  Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views , 2008, Comput. Vis. Image Underst..

[15]  Mubarak Shah,et al.  Appearance modeling for tracking in multiple non-overlapping cameras , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Jian-Huang Lai,et al.  Spatial-temporal consistent labeling of tracked pedestrians across non-overlapping camera views , 2011, Pattern Recognit..

[18]  Chung-Lin Huang,et al.  Multiple Objects Tracking across Multiple Non-Overlapped Views , 2011, PSIVT.

[19]  Qingming Huang,et al.  Coupling Multiple Alignments and Re-ranking for Low-Latency Online Multi-target Tracking , 2014, ACCV.

[20]  Masayuki Mukunoki,et al.  Shinpuhkan2014: A Multi-Camera Pedestrian Dataset for Tracking People across Multiple Cameras , 2014 .

[21]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[22]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[23]  M. Klein A Primal Method for Minimal Cost Flows with Applications to the Assignment and Transportation Problems , 1966 .

[24]  Xiaogang Wang,et al.  Locally Aligned Feature Transforms across Views , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Hai Tao,et al.  Evaluating Appearance Models for Recognition, Reacquisition, and Tracking , 2007 .

[26]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ramin Zabih,et al.  Bayesian multi-camera surveillance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[28]  Ajay Divakaran,et al.  Multi-camera calibration, object tracking and query generation , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[29]  John D. Austin,et al.  Adaptive histogram equalization and its variations , 1987 .

[30]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Pierre Vandergheynst,et al.  Cascade of descriptors to detect and track objects across any network of cameras , 2010, Comput. Vis. Image Underst..

[32]  W. Eric L. Grimson,et al.  Inference of non-overlapping camera network topology by measuring statistical dependence , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.