Special issue on multi-camera and multi-modal sensor fusion

Advances in sensing and in communication technologies as well as the increasing availability of computational power have favoured the emergence of applications based on systems combining multiple cameras and possibly other sensing modalities. These applications include multi-sensor dynamic scene analysis, immersive human–computer interfaces, automated filtering, summarization and distribution of multi-sensor data, and applications in smart environments, healthcare, automotive, sports, and teleconferencing. This Special Issue covers the state-of-art and recent advances in several aspects of multi-sensor detection, tracking, planning and their applications. Several important topics are addressed such as optimal monitoring of a wide area with a limited number of sensors, management of the information flow and the decisions across the network, combination of uncalibrated moving platforms with fixed sensors and the selection of relevant portions of the captured data for display to a user, pose estimation of objects, and persistent labelling of moving objects across the monitored site. Pan-tilt-zoom cameras can be used to actively observe complex areas with a relatively small number of sensors. In ‘‘Exploiting Distinctive Visual Landmark Maps in Pan-Tilt-Zoom Camera Networks”, sensor slaving is used to direct the attention to events of interest, and people are monitored across a wide area using cooperative tracking. In particular, any camera is enabled to dynamically take the role of master or slave in the network to achieve the tracking task. Mobile cameras can be used in combination with fixed cameras (e.g., the existing CCTV infrastructure). In ‘‘Cascade of Descriptors to Detect and Track Objects across any Network of Cameras” a master– slave control mechanism is used for mobile and fixed uncalibrated cameras whose goal is to detect and track objects. Mobile cameras are used as slaves, whereas the fixed cameras are used as masters. In the specific application of this framework, people are detected through grids of region descriptors in the cascade. ‘‘Vision and RFID Data Fusion for Tracking People in Crowds by a Mobile Robot” addresses the problem of fusing heterogeneous data from a pan-tilt camera and an omni-directional RFID detector. Multiple cues are fused in a particle filtering framework to improve the localisation of objects over time in simple indoor scenes. When using multiple cameras with overlapping fields of view, the 3D pose of an object can be estimated from the image sequences. In ‘‘Pose Estimation from Multiple Cameras Based on Sylvester’s Equation”, the authors discuss a distributed solution of the pose estimation problem that is robust to errors due to occlusions. The information captured by multiple cameras can be analysed to produce automated television-like camera switching and zooming on interesting portions of a dynamic scene. The goal is