An audio-visual database for evaluating person tracking algorithms

This paper presents an audio-visual database that can be used as a reference database for testing and evaluation of video, audio or joint audio-visual person tracking algorithms, as well as speaker localization methods. Additional possible uses include the testing of face detection and pose estimation algorithms. A number of different scenes are included in the database, ranging from simple to complex scenes that can challenge existing algorithms. They include different subjects, with appearances that can cause problems to video tracking algorithms, (e.g. facial features such as beards, glasses, etc.), optimal and artificially created sub-optimal lighting conditions, subject movement based on simple as well as random motion trajectories, different distances from the camera/microphones and occlusion. The database incorporates ground truth data (3D position in time) originating from a commercially available 4-camera infrared (IR) tracking system. Examples of how the database can be used to evaluate video and audio tracking algorithms are also provided.

[1]  Ioannis Pitas,et al.  Looking for Faces and Facial Features in Color Images , 1997 .

[2]  Sascha Spors,et al.  Joint audio-video object localization and tracking , 2001 .

[3]  Benesty Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[4]  Greg Welch,et al.  Motion Tracking: No Silver Bullet, but a Respectable Arsenal , 2002, IEEE Computer Graphics and Applications.

[5]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Walter Kellermann,et al.  An integrated real-time system for immersive audio applications , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[8]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ioannis Pitas,et al.  A mutual information approach to articulated object tracking , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[10]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.