论文信息 - Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation

Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation

Multiple-camera systems can expand coverage and mitigate occlusion problems. However, temporal synchronization remains a problem for budget cameras and capture devices. We propose an out-of-the-box framework to temporally synchronize multiple cameras using semantic human pose estimation from the videos. Human pose predictions are obtained with an out-of-the-shelf pose estimator for each camera. Our method firstly calibrates each pair of cameras by minimizing an energy function related to epipolar distances. We also propose a simple yet effective multiple-person association algorithm across cameras and a score-regularized energy function for improved performance. Secondly, we integrate the synchronized camera pairs into a graph and derive the optimal temporal displacement configuration for the multiple-camera system. We evaluate our method on four public benchmark datasets and demonstrate robust sub-frame synchronization accuracy on all of them.

Wenhu Qin | Chunyu Wang | Zhe Zhang

[1] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3] Stephen Gould,et al. Multiview Detection with Feature Perspective Transformation , 2020, ECCV.

[4] L. Davis,et al. M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene , 2003, International Journal of Computer Vision.

[5] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Wenjun Zeng,et al. AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild , 2020, International Journal of Computer Vision.

[7] Nassir Navab,et al. 3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[9] Yichen Wei,et al. Integral Human Pose Regression , 2017, ECCV.

[10] Takeo Kanade,et al. Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Wenjun Zeng,et al. Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Zhiao Huang,et al. Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[15] Marc Pollefeys,et al. Camera network calibration from dynamic silhouettes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16] J. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[17] Hans Weda,et al. Synchronization of multiple video recordings based on still camera flashes , 2006, MM '06.

[18] Peng Liu,et al. Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function , 2021, Sensors.

[19] Dong Liu,et al. Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[21] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[22] Mauro Barbieri,et al. Synchronization of multi-camera video recordings based on audio , 2007, ACM Multimedia.

[23] Hideaki Kimata,et al. Human Pose as Calibration Pattern: 3D Human Pose Estimation with Multiple Unsynchronized and Uncalibrated Cameras , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24] Yizhou Wang,et al. MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Bo Zhao,et al. AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding , 2017, ArXiv.

[26] H. Ai,et al. Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Hideo Saito,et al. Reconstructing the 3D Trajectory of a Ball with Unsynchronized Cameras , 2015, Int. J. Comput. Sci. Sport.

[28] Wenjun Zeng,et al. VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment , 2020, ECCV.

[29] Zhengyou Zhang,et al. Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30] Yichen Wei,et al. Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[31] Hans-Peter Seidel,et al. Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] David Vázquez,et al. On-Board Detection of Pedestrian Intentions , 2017, Sensors.

[34] Wenjun Zeng,et al. Cross View Fusion for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).