A System for Tracking and Recognizing Multiple People with Multiple Cameras

In this paper we present a robust real-time method for tracking and recognizing multiple people with multiple cameras. Our method uses both static and Pan-Tilt-Zoom (PTZ) cameras to provide visual attention. The PTZ camera system uses face recognition to register people in the scene and “lock-on” to those individuals. The static camera system provides a global view of the environment and is used to re-adjust the tracking of the system when the PTZ cameras lose their targets. The system works well even when people occlude one another. The underlying visual processes rely on color segmentation, movement tracking and shape information to locate target candidates. Color indexing and face recognition modules help register these candidates with the system.

[1]  ChangShih-Fu,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997 .

[2]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[3]  A. Pentland,et al.  Attention-driven Expression and Gesture Analysis in an Interactive Environment , 1995 .

[4]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Michael G. Kay,et al.  Multimedia sensor fusion for intelligent camera control , 1996, 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems (Cat. No.96TH8242).

[6]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[7]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  James W. Davis,et al.  The KidsRoom: An example application using a deep perceptual interface , 1999 .

[9]  Jake K. Aggarwal,et al.  Multisensor Fusion for Computer Vision , 1993, NATO ASI Series.

[10]  Yves Demazeau,et al.  Principles and techniques for sensor data fusion , 1993, Signal Process..

[11]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[12]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[13]  David C. Gibbon,et al.  Multi-modal system for locating heads and faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  Raphaël Féraud,et al.  LISTEN: a system for locating and tracking individual speakers , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  Alexander H. Waibel,et al.  Skin-Color Modeling and Adaptation , 1998, ACCV.