Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization

Multi-person tracking plays a critical role in the analysis of surveillance video. However, most existing work focus on shorter-term (e.g. minute-long or hour-long) video sequences. Therefore, we propose a multi-person tracking algorithm for very long-term (e.g. month-long) multi-camera surveillance scenarios. Long-term tracking is challenging because 1) the apparel/appearance of the same person will vary greatly over multiple days and 2) a person will leave and re-enter the scene numerous times. To tackle these challenges, we leverage face recognition information, which is robust to apparel change, to automatically reinitialize our tracker over multiple days of recordings. Unfortunately, recognized faces are unavailable oftentimes. Therefore, our tracker propagates identity information to frames without recognized faces by uncovering the appearance and spatial manifold formed by person detections. We tested our algorithm on a 23-day 15-camera data set (4,935 hours total), and we were able to localize a person 53.2% of the time with 69.8% precision. We further performed video summarization experiments based on our tracking output. Results on 116.25 hours of video showed that we were able to generate a reasonable visual diary (i.e. a summary of what a person did) for different people, thus potentially opening the door to automatic summarization of the vast amount of surveillance video generated every day.

[1]  Steven Gold,et al.  Softmax to Softassign: neural network algorithms for combinatorial optimization , 1996 .

[2]  Luc Van Gool,et al.  Real-time affine region tracking and coplanar grouping , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  David A. Forsyth,et al.  30Hz Object Detection with DPM V5 , 2014, ECCV.

[4]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5]  Nicolas Gillis,et al.  Two algorithms for orthogonal nonnegative matrix factorization with application to clustering , 2012, Neurocomputing.

[6]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[7]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Afshin Dehghan,et al.  Target Identity-aware Network Flow for online multiple target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Robert T. Collins,et al.  Multiple Target Tracking Using Frame Triplets , 2012, ACCV.

[11]  Xiaoqin Zhang,et al.  Graph Based Discriminative Learning for Robust and Efficient Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[13]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[15]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[16]  Wenhan Luo,et al.  Multiple object tracking: A literature review , 2014, Artif. Intell..

[17]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Konrad Schindler,et al.  Globally Optimal Multi-target Tracking on a Hexagonal Lattice , 2010, ECCV.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ramakant Nevatia,et al.  Multi-target tracking by online learning of non-linear motion patterns and robust appearance models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  James J. Little,et al.  A Linear Programming Approach for Multiple Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Robert T. Collins,et al.  Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Erkki Oja,et al.  Linear and Nonlinear Projective Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[28]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[29]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[30]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[31]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[32]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[33]  Alan Fern,et al.  Discriminatively trained particle filters for complex multi-object tracking , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  A. Ellis,et al.  PETS2009 and Winter-PETS 2009 results: A combined evaluation , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[35]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[36]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[37]  Konrad Schindler,et al.  Multi-target tracking by continuous energy minimization , 2011, CVPR 2011.

[38]  Kuk-Jin Yoon,et al.  Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Xiaoou Tang,et al.  Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[40]  Seungjin Choi,et al.  Nonnegative Matrix Factorization with Orthogonality Constraints , 2010, J. Comput. Sci. Eng..

[41]  Luc Van Gool,et al.  Coupled Detection and Trajectory Estimation for Multi-Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Andrew Zisserman,et al.  Detecting People Looking at Each Other in Videos , 2014, International Journal of Computer Vision.

[43]  Xiaoqin Zhang,et al.  Single and Multiple Object Tracking Using Log-Euclidean Riemannian Subspace and Block-Division Appearance Model , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[45]  James J. Little,et al.  Learning to Track and Identify Players from Broadcast Sports Videos , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Robert T. Collins,et al.  Multitarget data association with higher-order motion models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Clark McPhail,et al.  Using Film to Analyze Pedestrian Behavior , 1982 .

[48]  Ivan Laptev,et al.  On pairwise costs for network flow multi-object tracking , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[50]  Pascal Fua,et al.  Facial Descriptors for Identity-Preserving Multiple People Tracking , 2013 .

[51]  Pascal Fua,et al.  Multi-Commodity Network Flow for Tracking Multiple People , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Yi Yang,et al.  Learning to predict health status of geriatric patients from observational data , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[53]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[56]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[57]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58]  Andrea Cavallaro,et al.  Accepted for Publication in Ieee Transactions on Image Processing Adaptive Appearance Modeling for Video Tracking: Survey and Evaluation , 2022 .

[59]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[60]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[61]  Gang Wang,et al.  Tracklet Association with Online Target-Specific Metric Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[63]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[64]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Konrad Schindler,et al.  Detection- and Trajectory-Level Exclusion in Multiple Object Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Yi Yang,et al.  Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[68]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[69]  Ramakant Nevatia,et al.  How does person identity recognition help multi-person tracking? , 2011, CVPR 2011.

[70]  Ramakant Nevatia,et al.  Multi-target tracking by on-line learned discriminative appearance models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.