Generalizable Multi-Camera 3D Pedestrian Detection

We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body poses and person’s bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the world ground plane and fuse them with a new formulation of a clique cover problem. We also propose an optional step for exploiting pedestrian appearance during fusion by using a domain-generalizable person re-identification model. We evaluated the proposed approach on the challenging WILDTRACK dataset. It obtained a MODA of 0.569 and an F-score of 0.78, superior to state-of-the-art generalizable detection techniques.

[1]  Ling Shao,et al.  Generalizable Pedestrian Detection: The Elephant In The Room , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Sven Nordholm,et al.  A Bayesian Filter for Multi-View 3D Multi-Object Tracking With Occlusion Handling , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shengcai Liao,et al.  Efficient Single-Stage Pedestrian Detector by Asymptotic Localization Fitting and Multi-Scale Context Encoding , 2020, IEEE Transactions on Image Processing.

[6]  Stephen Gould,et al.  Multiview Detection with Feature Perspective Transformation , 2020, ECCV.

[7]  Alexandre Alahi,et al.  MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Wei Liu,et al.  High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Pengfei Guo,et al.  Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry , 2020, ECCV.

[10]  Behzad Dariush,et al.  Recognition and 3D Localization of Pedestrian Actions from Monocular Video , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[11]  Liqiu Meng,et al.  Georeferencing: a review of methods and applications , 2014, Ann. GIS.

[12]  Luc Van Gool,et al.  WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Pascal Fua,et al.  Deep Occlusion Reasoning for Multi-camera Multi-target Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Mingyang Li,et al.  MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jesús Bescós,et al.  Semantic Driven Multi-Camera Pedestrian Detection , 2018, ArXiv.

[16]  Adrian Kosowski,et al.  Classical Coloring of Graphs , 2008 .

[17]  Hao Zhu,et al.  CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tao Xiang,et al.  Learning Generalisable Omni-Scale Representations for Person Re-Identification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Vladislav Sovrasov,et al.  Building Computationally Efficient and Well-Generalizing Person Re-Identification Models with Metric Learning , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).