A Joint System for Person Tracking and Face Detection

Visual detection and tracking of humans in complex scenes is a challenging problem with a wide range of applications, for example surveillance and human-computer interaction. In many such applications, time-synchronous views from multiple calibrated cameras are available, and both frame-view and space-level human location information is desired. In such scenarios, efficiently combining the strengths of face detection and person tracking is a viable approach that can provide both levels of information required and improve robustness. In this paper, we propose a novel vision system that detects and tracks human faces automatically, using input from multiple calibrated cameras. The method uses an Adaboost algorithm variant combined with mean shift tracking applied on single camera views for face detection and tracking, and fuses the results on multiple camera views to check for consistency and obtain the three-dimensional head estimate. We apply the proposed system to a lecture scenario in a smart room, on a corpus collected as part of the CHIL European Union integrated project. We report results on both frame-level face detection and three-dimensional head tracking. For the latter, the proposed algorithm achieves similar results with the IBM “PeopleVision” system.

[1]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Stan Z. Li,et al.  Real-time multi-view face detection , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[3]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Climent Nadeu,et al.  Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[5]  Tim J. Ellis,et al.  Multi camera image tracking , 2006, Image Vis. Comput..

[6]  Greg Welch,et al.  An Introduction to Kalman Filter , 1995, SIGGRAPH 2001.

[7]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[8]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[9]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[11]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[12]  Sharath Pankanti,et al.  Face cataloger: multi-scale imaging for relating identity to location , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[13]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[17]  Narendra Ahuja,et al.  A SNoW-Based Face Detector , 1999, NIPS.