Real-Time Face Detection and Motion Analysis With Application in “Liveness” Assessment

A robust face detection technique along with mouth localization, processing every frame in real time (video rate), is presented. Moreover, it is exploited for motion analysis onsite to verify "liveness" as well as to achieve lip reading of digits. A methodological novelty is the suggested quantized angle features ("quangles") being designed for illumination invariance without the need for preprocessing (e.g., histogram equalization). This is achieved by using both the gradient direction and the double angle direction (the structure tensor angle), and by ignoring the magnitude of the gradient. Boosting techniques are applied in a quantized feature space. A major benefit is reduced processing time (i.e., that the training of effective cascaded classifiers is feasible in very short time, less than 1 h for data sets of order 104). Scale invariance is implemented through the use of an image scale pyramid. We propose "liveness" verification barriers as applications for which a significant amount of computation is avoided when estimating motion. Novel strategies to avert advanced spoofing attempts (e.g., replayed videos which include person utterances) are demonstrated. We present favorable results on face detection for the YALE face test set and competitive results for the CMU-MIT frontal face test set as well as on "liveness" verification barriers.

[1]  Ian H. Witten,et al.  Detecting Replay Attacks in Audiovisual Identity Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[3]  A. Sheikholeslami,et al.  Real-time face detection and lip feature extraction using field-programmable gate arrays , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Josef Bigün,et al.  Evaluating liveness by face images and the structure tensor , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[5]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[7]  Fabrizio Smeraldi,et al.  Retinal vision applied to facial features detection and face authentication , 2002, Pattern Recognit. Lett..

[8]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[10]  Bernhard Fröba,et al.  Real-Time Face Detection Using Edge-Orientation Matching , 2001, AVBPA.

[11]  Tieniu Tan,et al.  Live face detection based on the analysis of Fourier spectra , 2004, SPIE Defense + Commercial Sensing.

[12]  Juergen Luettin,et al.  Audio-Visual Speech Modelling for Continuous Speech Recognition , 2000 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Juergen Luettin,et al.  Speaker identification by lipreading , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Tanzeem Choudhury,et al.  Multimodal person recognition using unconstrained audio and video , 1998 .

[19]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[22]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[23]  Josef Bigün,et al.  Recognition by symmetry derivatives and the generalized structure tensor , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[25]  Josef Bigün,et al.  Person Verification by Lip-Motion , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[26]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[27]  J. Bigun,et al.  Assuring liveness in biometric identity authentication by real-time face tracking , 2004, Proceedings of the 2004 IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety, 2004. CIHSPS 2004..

[28]  Jiri Matas,et al.  Feature-based affine-invariant localization of faces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.