Hierarchical bilinear network for high performance face detection

Deep Convolutional Networks (DCNs) have achieved great success in face detection. Most architectures of the DCN-based methods, however, suffer from multiple separated steps and large-size models, which increase the training complexity and also slow down the testing speed. In this paper, we propose an efficient end-to-end architecture, called Hierarchical Bilinear Network (HBN), for fast and accurate face detection. It mainly consists of two parts: the Backbone Network and the Bilinear Network. The Backbone Network generates hierarchical feature maps for efficiently characterizing faces of different scales, while the Bilinear Network classifies the regions and regresses the face bounding-boxes on each feature map by introducing the Inception module and weights sharing. Benefited from the characters of the proposed architecture, it obtains a better comprehensive performance regarding the model effectiveness, running efficiency, and parameter size, compared with other DCN-based methods. Extensive experimental results demonstrate that our detector achieves competitive accuracy on both the FDDB database and the WIDER FACE database, while still runs in real time (about 69 FPS on a Titan Black GPU) with a tiny size (2.2 MB) model.

[1]  Gang Hua,et al.  Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[9]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[10]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[11]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[12]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[13]  Yuan Li,et al.  High-Performance Rotation Invariant Multiview Face Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Bin Yang,et al.  Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[20]  Stefanos Zafeiriou,et al.  A survey on face detection in the wild: Past, present and future , 2015, Comput. Vis. Image Underst..

[21]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[22]  Bo Wu,et al.  Fast rotation invariant multi-view face detection based on real Adaboost , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..