论文信息 - Deep Occlusion Reasoning for Multi-camera Multi-target Detection

Deep Occlusion Reasoning for Multi-camera Multi-target Detection

People detection in single 2D images has improved greatly in recent years. However, comparatively little of this progress has percolated into multi-camera multipeople tracking algorithms, whose performance still degrades severely when scenes become very crowded. In this work, we introduce a new architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model those ambiguities. One of its key ingredients are high-order CRF terms that model potential occlusions and give our approach its robustness even when many people are present. Our model is trained end-to-end and we show that it outperforms several state-of-the-art algorithms on challenging scenes.

[1] Philip H. S. Torr,et al. Higher Order Potentials in End-to-End Trainable Conditional Random Fields , 2015, ArXiv.

[2] Tatjana Chavdarova,et al. Deep Multi-camera People Detection , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[3] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Peter Kontschieder,et al. Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Pascal Fua,et al. Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Silvio Savarese,et al. Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[9] Philip H. S. Torr,et al. Learning Arbitrary Potentials in CRFs with Gradient Descent , 2017, ArXiv.

[10] Tony Jebara,et al. Probability Product Kernels , 2004, J. Mach. Learn. Res..

[11] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[12] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[13] Pascal Fua,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[14] Luc Van Gool,et al. The WILDTRACK Multi-Camera Person Dataset , 2017, ArXiv.

[15] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[18] Vibhav Vineet,et al. Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, International Journal of Computer Vision.

[19] Thierry Artières,et al. Neural conditional random fields , 2010, AISTATS.

[20] Carsten Rother,et al. Joint Training of Generic CNN-CRF Models with Stochastic Optimization , 2016, ACCV.

[21] Andrea Cavallaro,et al. Image Analysis for Video Surveillance Based on Spatial Regularization of a Statistical Model-Based Change Detection , 2001, Real Time Imaging.

[22] Yang Liu,et al. Multi-view People Tracking via Hierarchical Trajectory Composition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Pascal Fua,et al. Multi-modal Mean-Fields via Cardinality-Based Clamping , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Yonghong Tian,et al. Robust multiple cameras pedestrian detection with multi-view Bayesian network , 2015, Pattern Recognit..

[26] Justin Domke,et al. Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Pushmeet Kohli,et al. Higher-Order Models in Computer Vision , 2012 .

[28] B. Schiele,et al. How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Jing Zhang,et al. Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Yannick Boursier,et al. Sparsity Driven People Localization with a Heterogeneous Network of Cameras , 2011, Journal of Mathematical Imaging and Vision.

[31] Alex Pentland,et al. A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32] Pascal Fua,et al. Principled Parallel Mean-Field Inference for Discrete Random Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] ZhangJing,et al. Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video , 2009 .

[34] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .