As a first step towards a perceptual user interface, a computer vision color tracking algorithm is developed and applied towards tracking human faces. Computer vision algorithms that are intended to form part of a perceptual user interface must be fast and efficient. They must be able to track in real time yet not absorb a major share of computational resources: other tasks must be able to run while the visual interface is being used. The new algorithm developed here is based on a robust nonparametric technique for climbing density gradients to find the mode (peak) of probability distributions called the mean shift algorithm. In our case, we want to find the mode of a color distribution within a video scene. Therefore, the mean shift algorithm is modified to deal with dynamically changing color probability distributions derived from video frame sequences. The modified algorithm is called the Continuously Adaptive Mean Shift (CAMSHIFT) algorithm. CAMSHIFT’s tracking accuracy is compared against a Polhemus tracker. Tolerance to noise, distractors and performance is studied. CAMSHIFT is then used as a computer interface for controlling commercial computer games and for exploring immersive 3D graphic worlds. Introduction This paper is part of a program to develop a Perceptual User Interface for computers. Perceptual interfaces are ones in which the computer is given the ability to sense and produce analogs of the human senses, such as allowing computers to perceive and produce localized sound and speech, giving computers a sense of touch and force feedback, and in our case, giving computers an ability to see. The work described in this paper is part of a larger effort aimed at giving computers the ability to segment, track, and understand the pose, gestures, and emotional expressions of humans and the tools they might be using in front of a computer or settop box. In this paper we describe the development of the first core module in this effort: a 4-degree of freedom color object tracker and its application to flesh-tone-based face tracking. Computer vision face tracking is an active and developing field, yet the face trackers that have been developed are not sufficient for our needs. Elaborate methods such as tracking contours with snakes [[10][12][13]], using Eigenspace matching techniques [14], maintaining large sets of statistical hypotheses [15], or convolving images with feature detectors [16] are far too computationally expensive. We want a tracker that will track a given face in the presence of noise, other faces, and hand movements. Moreover, it must run fast and efficiently so that objects may be tracked in real time (30 frames per second) while consuming as few system resources as possible. In other words, this tracker should be able to serve as part of a user interface that is in turn part of the computational tasks that a computer might routinely be expected to carry out. This tracker also needs to run on inexpensive consumer cameras and not require calibrated lenses. In order, therefore, to find a fast, simple algorithm for basic tracking, we have focused on color-based tracking [[7][8][9][10][11]], yet even these simpler algorithms are too computationally complex (and therefore slower at any given CPU speed) due to their use of color correlation, blob and region growing, Kalman filter smoothing and prediction, and contour considerations. The complexity of the these algorithms derives from their attempts to deal with irregular object motion due to perspective (near objects to the camera seem to move faster than distal objects); image noise; distractors, such as other faces in the scene; facial occlusion by hands or other objects; and lighting variations. We want a fast, computationally efficient algorithm that handles these problems in the course of its operation, i.e., an algorithm that mitigates the above problems “for free.” To develop such an algorithm, we drew on ideas from robust statistics and probability distributions. Robust statistics are those that tend to ignore outliers in the data (points far away from the region of interest). Thus, robust Intel Technology Journal Q2 ‘98 Computer Vision Face Tracking For Use in a Perceptual User Interface 2 algorithms help compensate for noise and distractors in the vision data. We therefore chose to use a robust nonparametric technique for climbing density gradients to find the mode of probability distributions called the mean shift algorithm [2]. (The mean shift algorithm was never intended to be used as a tracking algorithm, but it is quite effective in this role.) The mean shift algorithm operates on probability distributions. To track colored objects in video frame sequences, the color image data has to be represented as a probability distribution [1]; we use color histograms to accomplish this. Color distributions derived from video image sequences change over time, so the mean shift algorithm has to be modified to adapt dynamically to the probability distribution it is tracking. The new algorithm that meets all these requirements is called CAMSHIFT. For face tracking, CAMSHIFT tracks the X, Y, and Area of the flesh color probability distribution representing a face. Area is proportional to Z, the distance from the camera. Head roll is also tracked as a further degree of freedom. We then use the X, Y, Z, and Roll derived from CAMSHIFT face tracking as a perceptual user interface for controlling commercial computer games and for exploring 3D graphic virtual worlds. Choose initial search window size and location HSV Image Set calculation region at search window center but larger in size than the search window Color histogram lookup in calculation region Color probability distribution Find center of mass within the search window Center search window at the center of mass and find area under it Converged YES NO Report X, Y, Z, and Roll Use (X,Y) to set search window center, 2*area 1/2
[1]
Thomas Ertl,et al.
Computer Graphics - Principles and Practice, 3rd Edition
,
2014
.
[2]
M. Carter.
Computer graphics: Principles and practice
,
1997
.
[3]
Christoph von der Malsburg,et al.
Tracking and learning graphs and pose on image sequences of faces
,
1996,
Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.
[4]
Alex Pentland,et al.
Pfinder: Real-Time Tracking of the Human Body
,
1997,
IEEE Trans. Pattern Anal. Mach. Intell..
[5]
Dorin Comaniciu,et al.
Robust analysis of feature spaces: color image segmentation
,
1997,
Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[6]
Azriel Rosenfeld,et al.
Computer Vision
,
1988,
Adv. Comput..
[7]
Alex Waibel,et al.
Face locating and tracking for human-computer interaction
,
1994,
Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[8]
Alex Pentland,et al.
Visually Controlled Graphics
,
1993,
IEEE Trans. Pattern Anal. Mach. Intell..
[9]
ChengYizong.
Mean Shift, Mode Seeking, and Clustering
,
1995
.
[10]
Yizong Cheng,et al.
Mean Shift, Mode Seeking, and Clustering
,
1995,
IEEE Trans. Pattern Anal. Mach. Intell..
[11]
Kazuo Kyuma,et al.
Computer vision for computer games
,
1996,
Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.
[12]
Alex Pentland,et al.
View-based and modular eigenspaces for face recognition
,
1994,
1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
[13]
Ioannis Pitas,et al.
Segmentation and tracking of faces in color images
,
1996,
Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.
[14]
Jack-Gérard Postaire,et al.
Catching moving objects with snakes for motion tracking
,
1995,
Pattern Recognit. Lett..
[15]
Alvy Ray Smith,et al.
Color gamut transform pairs
,
1978,
SIGGRAPH.
[16]
Michael Isard,et al.
Contour Tracking by Stochastic Propagation of Conditional Density
,
1996,
ECCV.
[17]
Paul W. Fieguth,et al.
Color-based tracking of heads and other mobile objects at video frame rates
,
1997,
Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.