ViS-HuD: Using Visual Saliency to Improve Human Detection with Convolutional Neural Networks

The paper presents a technique to improve human detection in still images using deep learning. Our novel method, ViS-HuD, computes visual saliency map from the image. Then the input image is multiplied by the map and product is fed to the Convolutional Neural Network (CNN) which detects humans in the image. A visual saliency map is generated using ML-Net and human detection is carried out using DetectNet. ML-Net is pre-trained on SALICON while, DetectNet is pre-trained on ImageNet database for visual saliency detection and image classification respectively. The CNNs of ViS-HuD were trained on two challenging databases - Penn Fudan and TUD-Brussels Benchmark. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on Penn Fudan Dataset with 91.4% human detection accuracy and it achieves average miss-rate of 53% on the TUD-Brussels benchmark.

[1]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Xiaogang Wang,et al.  Single-Pedestrian Detection Aided by Multi-pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Nuno Vasconcelos,et al.  Learning Optimal Seeds for Diffusion-Based Salient Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Honglak Lee,et al.  Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines , 2013, ICML.

[6]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[7]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Yusuf Engin Tetik,et al.  Pedestrian detection from still images , 2011, 2011 International Symposium on Innovations in Intelligent Systems and Applications.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Shanmuganathan Raman,et al.  Facial Expression Recognition Using Visual Saliency and Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  William Lopez,et al.  Cascade Classifiers and Saliency Maps Based People Detection , 2017, AVR.

[13]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[14]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[15]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[18]  Monserrate Intriago-Pazmiño,et al.  Algorithms for People Recognition in Digital Images: A Systematic Review and Testing , 2017, WorldCIST.

[19]  Long Zhu,et al.  Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hongyang Chao,et al.  Learning Shape Detector by Quantizing Curve Segments with Multiple Distance Metrics , 2010, ECCV.

[23]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Xiaogang Wang,et al.  Switchable Deep Network for Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[26]  Yu Yan,et al.  An Unconstrained Face Detection Algorithm Based on Visual Saliency , 2017, EIDWT.

[27]  Tao Xiang,et al.  Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ayesha Gurnani,et al.  VEGAC: Visual Saliency-based Age, Gender, and Facial Expression Classification Using Convolutional Neural Networks , 2018, ArXiv.

[29]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[33]  Huchuan Lu,et al.  Deep networks for saliency detection via local estimation and global search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[35]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[36]  Liang Lin,et al.  Dynamical And-Or Graph Learning for Object Shape Modeling and Detection , 2012, NIPS.

[37]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Ayesha Gurnani,et al.  Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[39]  Larry S. Davis,et al.  Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  B. Schiele,et al.  Multi-cue onboard pedestrian detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[42]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[44]  Gang Song,et al.  Object Detection Combining Recognition and Segmentation , 2007, ACCV.

[45]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  ZhuLong,et al.  Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing , 2010 .

[47]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Guoshan Zhang,et al.  A Saliency Based Human Detection Framework for Infrared Thermal Images , 2017, CCCV.

[49]  Yibin Li,et al.  Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas , 2017, Sensors.

[50]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[51]  Huchuan Lu,et al.  Saliency Detection via Absorbing Markov Chain , 2013, 2013 IEEE International Conference on Computer Vision.

[52]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Weria Khaksar,et al.  Facial Expression Recognition Using Salient Features and Convolutional Neural Network , 2017, IEEE Access.

[54]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Xiaogang Wang,et al.  Person Re-identification by Salience Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[56]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[57]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Aurobinda Routray,et al.  Automatic facial expression recognition using features of salient facial patches , 2015, IEEE Transactions on Affective Computing.