BoxFlow: Unsupervised Face Detector Adaptation from Images to Videos

Face detectors are usually trained on static images but deployed in the wild such as surveillance videos. Due to the domain shift between images and videos, directly applying the image-based face detectors onto videos usually gives unsatisfactory performance. In this paper, we introduce the BoxFlow – a new unsupervised detector adaptation method that can effectively adapt a face detector pre-trained on static images to videos. BoxFlow unsupervisedly adapts face detectors through fully exploiting the motion contexts across video frames. In particular, BoxFlow introduces three novel components: (1) generalized heat map representation of face locations with augmented shape flexibility; (2) motion based temporal contextual regularization among adjacent frames for unsupervised face detection refinement; (3) a self-paced learning strategy that adapts face detectors from easy data samples to challenging ones progressively. With these key components, we develop a systematic unsupervised face detector adaptation framework to help face detectors adapt to various deployed environments. Extensive experiments on the IDA dataset clearly demonstrate the superiority of our proposed method. Without utilizing any annotation, the BoxFlow achieves about 10%-20% performance gain in terms of Average Precision than directly applying image-based face detectors.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kate Saenko,et al.  Asymmetric and Category Invariant Feature Transformations for Domain Adaptation , 2014, International Journal of Computer Vision.

[4]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[5]  K. A. Joshi,et al.  A Survey on Moving Object Detection and Tracking in Video Surveillance System , 2012 .

[6]  Xiaochun Cao,et al.  Makeup Like a Superstar: Deep Localized Makeup Transfer Network , 2016, IJCAI.

[7]  Pramod Sharma,et al.  Unsupervised incremental learning for improved object detection in a video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Gang Hua,et al.  Detection by detections: Non-parametric detector adaptation for a video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[11]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Chuang Jan Chang,et al.  LSO-AdaBoost Based Face Detection for IP-CAM Video , 2013 .

[15]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Pramod Sharma,et al.  Efficient Detector Adaptation for Object Detection in a Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[18]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Rogério Schmidt Feris,et al.  Capturing People in Surveillance Video , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Sumit Chopra,et al.  DLID: Deep Learning for Domain Adaptation by Interpolating between Domains , 2013 .

[21]  Kazuhiko Sumi,et al.  Context‐based robust face detection algorithm for surveillance cameras , 2008 .

[22]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[24]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[25]  Dacheng Tao,et al.  A Comprehensive Survey on Pose-Invariant Face Recognition , 2015, ACM Trans. Intell. Syst. Technol..

[26]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[27]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Ashraf A. Kassim,et al.  Facial Landmark Detection via Progressive Initialization , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).