Visual face tracking and its applications
暂无分享,去创建一个
Human faces convey very important information, i.e., identity, emotion, focus of attention, etc., in human daily communications.
People therefore have long contemplated the possibility of automatic analysis of human faces for convenient human-computer interaction. Compared to the other body parts, the human face is the part that can be captured most conveniently by video camera, and it is the most statistically consistent in color, shape, and texture for modeling. As the computational power of machines increases and many efficient face analysis techniques emerge, automatic detecting and tracking of human faces by video cameras is becoming a possibility for such purposes.
This dissertation aims at developing algorithms for tracking human faces in videos for two scenarios: (1) human computer interaction HC), and (2) meeting room video analysis (MRVA). In the HCI scenario, the face usually appears close to the camera in high resolution. We explore Active Shape Model (ASM) techniques for localizing the 2D facial features in a CONDENSATION framework. The 3D face location and orientation can be inferred using an optical flow based 3D model face tracker. In the MRVA scenario, the faces are usually far away from the camera and the resolution is low. Tensor techniques are explored for localizing the faces. Various techniques, i.e., meanshift tracking, annealed particle filtering, and online model updating in generative Bayesian model and in subspaces, are explored for tracking the 212 head locations and 3D head poses. The focus of attention of the meeting attendants can then be inferred based on the head pose. As many inference tasks depend on the object appearances cropped according to the tracking result, which however is usually noisy due to outliers and imperfect models, we also explore the possibility of eliminating the appearance inconsistency caused by misalignments by simultaneously refining PCA models from the data using variational message passing (VMP) techniques, Based on the algorithms we have developed, we show the performance of a camera mouse in the HCI scenario, We also show some experiments for meeting room video indexing and retrieval in the MRVA scenario.