Not Too Deep CNN for Face Detection in Real Life Scenario

This article presents our recent study of a moderately deep neural network architecture for detection of faces of widely variable sizes and orientations. One of the goals of this work is to achieve sufficiently low latency and acceptable true detection rates on low resolution video or still image data. Several attempts over the years have been made to design a robust and generic face detection system. But due to the inherent complexity of the problem, localization of face in complex and low quality images still remains an open problem. Moreover, the existing state-of-the-art systems usually involve very large network architectures requiring significantly high computational resources for their training. Typical challenges with this data include visual variations due to lighting condition, facial expression, occlusion etc. In the present work, we have designed a moderately deep architecture of Convolutional Neural Network (CNN) suitable for its use on commonly available computing devices. Also, we have proposed some simple strategies for calibration of bounding box that is trained to localize a face even in poor lighting condition and various typical occlusion scenarios. The CNN of the proposed framework receives an input image at three different resolutions to detect faces of various sizes. Simulation results of the proposed approach on publicly available “WIDER FACE” database and another database of 27,576 images/video frames collected by us establish its effectiveness in certain real life scenarios.

[1]  Stefanos Zafeiriou,et al.  A survey on face detection in the wild: Past, present and future , 2015, Comput. Vis. Image Underst..

[2]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Shengsheng Yu,et al.  A Survey of Face Detection, Extraction and Recognition , 2003, Comput. Artif. Intell..

[9]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alejandro F. Frangi,et al.  Haar-like features with optimally weighted rectangles for rapid object detection , 2010, Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[16]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[17]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[20]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.