Modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour

Automatic understanding of people from images has increasingly become a central focus of research in computer vision over the past two decades. Tracking, reconstruction and recognition of human movement from video in general scenes remain challenging problems. Research in this field is motivated by the diverse potential applications of such technology for human–machine interfaces, visual surveillance, biometrics, content production, medical diagnosis and visual search. Since the previous special issue of CVIU on modeling people was published in 2001 [1], there have been substantial advances. This special issue brings together 10 papers that illustrate the advances over the past 5 years and highlight potential directions for future research. In the first paper of this special issue, Moeslund et al. provide a comprehensive survey of advances in visionbased human motion capture and analysis from 2000 to 2006. This builds on the earlier paper of Moeslund and Granum [2] which reviewed research up to 2000. The latest review identifies over 300 publications in the principal vision conferences and journals during this 5-year period. A number of significant advances in reliable tracking, pose estimation and movement recognition are identified. In particular there has been significant progress in automatic human pose estimation from monocular image sequences. The introduction of methods for automatic pose initialisation, robust optimisation and the use of learnt motion models have increased the robustness of video-based pose estimation. Movement recognition has also become an area of increasing interest over the past few years. Progress has been made in the recognition of simple actions and the description of action grammars. Understanding behaviour and action remains an open problem for future research. The regular papers in this special issue present contributions in each of the research areas identified in the survey: tracking, pose estimation, and action recognition. Papers are primarily invited extended-journal-length submissions based on work presented by the authors at the International Conference on Computer Vision 2005 and the associated workshop titled Modelling People and Human Interaction.