Guest Editors' Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis

HUMAN Pose Recovery and Behavior Analysis (HuPBA) is one of the most challenging topics in Computer Vision, Pattern Analysis and Machine Learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. Given these inherent difficulties, the combination of alternative, complementary visual and nonvisual modalities coming from different types of sensors has drawn a lot of attention in the literature: sensor data from visual cameras like RGB, time-of-flight (ToF), infrared, light field, multispectral, underwater, or thermal wavelengths cameras, together with another non-visual sensors like audio signals, inertial measurement unit (IMU) data, electrothermal activity responses, or electroglottograph signals, among others, have been exploited and combined to estimate the pose, gesture and behavior in both single images and image sequences. The combination of these visual and non-visual modalities has increased the accuracy of computer vision approaches, although gives rise to new challenges with feature extraction, synchronization of data coming from different sensors, data fusion, and temporal series analysis. As Guest Editors of this Special Issue on Multimodal Human Pose Recovery and Behavior Analysis (M2HuPBA), we are happy to present 16 accepted papers that represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others, while presenting new multimodal datasets, in addition to those proposed by the ChaLearn Looking at People series of workshops. We would like to thank the authors of the 57 submissions we received, and above all the outstanding and timely work performed by the reviewers. All the 57 submissions followed a rigorous TPAMI review process, where at least three external reviewers provided reviews to each paper. We would also like to thank the Editor-In-Chief (EIC), David Forsyth, for making this special issue possible. We are also grateful to the editorial staff for managing the submission process and providing us with assistance. The set of 16 accepted papers can be split into three main categories within M2HuPBA: (i) human pose recovery and tracking; (ii) action and gesture recognition; and (iii) datasets. We describe these next.