Synergistic Face Detection and Pose Estimation with Energy-Based Models

We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose, and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets - one for frontal pose, one for rotated faces, and one for profiles - and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system's accuracy on both face detection and pose estimation is improved by training for the two tasks together.

[1]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[2]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[3]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[4]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[5]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[10]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[11]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  Shaogang Gong,et al.  Support vector regression and classification based multi-view face detection and recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[13]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[15]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Christophe Garcia,et al.  A neural architecture for fast and robust face detection , 2002, Object recognition supported by user interaction for service robots.

[17]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[18]  Hankyu Moon,et al.  Estimating facial pose from a sparse representation [face recognition applications] , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[19]  Bo Wu,et al.  Omni-directional face detection based on real AdaBoost , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[20]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[21]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.

[23]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .