Dimension Invariant Model for Human Head Detection

Detecting heads with full variations in camera view points, human poses, appearances, and scales is a key problem for many computer vision applications. Region convolutional neural networks (RCNN) achieved considerable success in handling the variances in poses and appearances. However, RCNN are inefficient in handling human heads of diverse scales. In this work, we focus on detecting human heads in complex scenes. Starting with traditional RCNN model, we extend it by leveraging person-scene relations and propose a dimension invariant convolutional neural network (DCNN) that coarsely predicts locations and scales of heads directly from the full image. We evaluate and compare our method with famous methods by using two benchmarks datasets. Experimental results show that our method outperforms these methods.

[1]  Francesco G. B. De Natale,et al.  Crowd behavior identification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Saleh Basalamah,et al.  Scale Driven Convolutional Neural Network Model for People Counting and Localization in Crowd Scenes , 2019, IEEE Access.

[4]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Faouzi Alaya Cheikh,et al.  SINGLE SHOT APPEARANCE MODEL (SSAM) FOR MULTI-TARGET TRACKING , 2019, Electronic Imaging.

[6]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[12]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Afshin Dehghan,et al.  On Detection, Data Association and Segmentation for Multi-Target Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Faouzi Alaya Cheikh,et al.  A Directed Sparse Graphical Model for Multi-target Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Bingbing Ni,et al.  Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Ahmed B. Altamimi,et al.  Anomalous entities detection and localization in pedestrian flows , 2018, Neurocomputing.

[17]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[18]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[22]  Stefania Bandini,et al.  Detecting Dominant Motion Flows and People Counting in High Density Crowds , 2014, J. WSCG.

[23]  Xiaofeng Ren,et al.  Finding people in archive films through tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[25]  Muhammad Uzair,et al.  A hybrid social influence model for pedestrian motion segmentation , 2018, Neural Computing and Applications.

[26]  Huchuan Lu,et al.  Pose-Invariant Embedding for Deep Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[27]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Ayaz Ahmad,et al.  Density independent hydrodynamics model for crowd coherency detection , 2017, Neurocomputing.

[30]  Ivan Laptev,et al.  Context-Aware CNNs for Person Head Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).