In recent years, because cameras have become inexpensive and ever more prevalent, there has been increasing interest in modeling human shape and motion from image data. This type of modeling has many applications, such as electronic publishing, entertainment, sports medicine and athletic training. This, however, is an inherently difficult task, both because the body is very complex and because the data that can be extracted from images is often incomplete, noisy and ambiguous. EPFL’s Computer Vision Laboratory seeks to overcome these difficulties by using facial and body anima tion models, not only to represent the data, but also to guide the fitting pr ocess, thereby substantially improving performance. Start from sophisticated 3-D animation models, we reformulate them so that they can be used for data analysis in the three following research areas. 1 Augmented reality and 3-D tracking In augmented reality applications, tracking and registrat ion of cameras and objects are required because, to combine real and rendered scenes, we must project synthetic models at the right location in real images. As shown in Fig. 1, we have developed robust realtime methods for 3-D tracking of rigid objects and human faces [9, 10]. We formulate the tracking problem in terms of local bundle adjustment and merge the information from preceding frames with that provided by a very limited number of keyframes created during a training stage, which results in a real-time tracker that d oes not jitter or drift and can deal with significant aspect changes. We have also developed the f ast 3-D object detection and pose estimation method [4, 5] which can be used to initialize or reinitialize the tracker in real-time. It relies on matching keypoints but, by contrast with previous methods that rely either on using ad hoc local descriptors or on estimating local affine deformations, the wide baseline matching of these keypoints is treated as a classification problem, in which each class corresponds to the set of all possible views of such a point. We synthesize a large number of views of individual keypoints of the object and train a classifier to recognize them. At run-time, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use powerful and fast classification methods to reduce matching error rates.
[1]
P. Fua,et al.
Towards Recognizing Feature Points using Classification Trees
,
2004
.
[2]
Vincent Lepetit,et al.
Fusing online and offline information for stable 3D tracking in real-time
,
2003,
2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..
[3]
Pascal Fua,et al.
Hierarchical Implicit Surface Joint Limits to Constrain Video-Based Motion Capture
,
2004,
ECCV.
[4]
Matthew Turk,et al.
A Morphable Model For The Synthesis Of 3D Faces
,
1999,
SIGGRAPH.
[5]
Pascal Fua,et al.
3D Human Body Tracking Using Deterministic Temporal Motion Models
,
2004,
ECCV.
[6]
Pascal Fua,et al.
Accurate face models from uncalibrated and ill-lit video sequences
,
2004,
CVPR 2004.
[7]
Pascal Fua,et al.
Articulated Soft Objects for Multiview Shape and Motion Capture
,
2003,
IEEE Trans. Pattern Anal. Mach. Intell..
[8]
Vincent Lepetit,et al.
Stable real-time 3D tracking using online and offline information
,
2004,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9]
Vincent Lepetit,et al.
Point matching as a classification problem for fast and robust object pose estimation
,
2004,
CVPR 2004.
[10]
Thomas Vetter,et al.
A morphable model for the synthesis of 3D faces
,
1999,
SIGGRAPH.
[11]
Vincent Lepetit,et al.
Markov-based Silhouette Extraction for Three--Dimensional Body Tracking in Presence of Cluttered Background
,
2004,
BMVC.