论文信息 - PHD: A Deep Learning Based Human Detection Framework for Panoramic Videos

PHD: A Deep Learning Based Human Detection Framework for Panoramic Videos

Panoramic video has attracted substantial research attention as the coming video format. It is capable of providing 360 degree immersive experience of omnidirectional visual information. State-of-the-art detection networks may fail to detect humans on spherical images, which are normally represented in deformed rectangular shapes. In this paper, we propose a socalled Panoramic Human Detection (PHD) scheme to address the task of human detection in panoramic videos. Moreover, the PHD method is designed to detect humans by extracting multiple overlapping sub-images from each integral spherical image, where three-dimensional rotation of spherical images is employed to ensure consistency of sub-images. Two detection box filters are designed for removing redundant boxes. Our PHD method is capable of accomplishing the task of human detection in various panoramic video types. Experiments prove that our PHD method outperforms the baseline by 35% and 48.6% in terms of precision and recall, respectively.

Peichang Zhang | Yongkai Huo | Jinting Tang | Zhenhui Chen

[1] Zehdreh Allen-Lafayette,et al. Flattening the Earth, Two Thousand Years of Map Projections , 1998 .

[2] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[3] Lixin Fan,et al. Object Detection in Equirectangular Panorama , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[4] Cordelia Schmid,et al. Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6] Kristen Grauman,et al. Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] J. Snyder. Flattening the Earth: Two Thousand Years of Map Projections , 1994 .

[8] Jinwen Ma,et al. Combination features and models for human detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Mark Goadrich,et al. The relationship between Precision-Recall and ROC curves , 2006, ICML.

[11] Larry S. Davis,et al. Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14] Shaharyar Kamal,et al. A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors , 2016 .

[15] James Diebel,et al. Representing Attitude : Euler Angles , Unit Quaternions , and Rotation Vectors , 2006 .

[16] Kuk-Jin Yoon,et al. SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360° Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ronghua Xu,et al. Real-Time Human Detection as an Edge Service Enabled by a Lightweight CNN , 2018, 2018 IEEE International Conference on Edge Computing (EDGE).

[18] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Shmuel Peleg,et al. Panoramic mosaics by manifold projection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21] Ned Greene,et al. Environment Mapping and Other Applications of World Projections , 1986, IEEE Computer Graphics and Applications.

[22] Lixin Fan,et al. Evaluation of Visual Object Trackers on Equirectangular Panorama , 2018, VISIGRAPP.