RGB-D Based Multi-attribute People Search in Intelligent Visual Surveillance

Searching people in surveillance videos is a typical task in intelligent visual surveillance (IVS). However, current IVS techniques can hardly handle multi-attribute queries, which is a natural way of finding people in real-world. The challenges arise from the extraction of multiple attributes which largely suffer from illumination change, shadow and complicated background in the real-world surveillance environments. In this paper, we investigate how these challenges can be addressed when IVS is equipped with RGB-D information obtained by an RGB-D camera. With the RGB-D information, we propose methods that accurately and robustly segment human region and extract three groups of attributes including biometrical attributes, appearance attributes and motion attributes. Furthermore, we introduce a novel IVS system which is capable of handling multi-attribute queries for searching people in surveillance videos. Experimental evaluations demonstrate the effectiveness of the proposed method and system, and also the promising applications of bringing RGB-D information into IVS.

[1]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[2]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[3]  Arbee L. P. Chen,et al.  Indexing and matching multiple-attribute strings for efficient multimedia query processing , 2006, IEEE Transactions on Multimedia.

[4]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[5]  Peter H. Tu,et al.  Appearance-based person reidentification in camera networks: problem overview and current approaches , 2011, J. Ambient Intell. Humaniz. Comput..

[6]  Massimo Piccardi,et al.  Height measurement as a session-based biometric for people matching across disjoint camera views , 2005 .

[7]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[9]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[11]  E. Jeges,et al.  Measuring human height using calibrated cameras , 2008, 2008 Conference on Human System Interactions.

[12]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[13]  Rogério Schmidt Feris,et al.  Attribute-based people search in surveillance environments , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[14]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Tsuhan Chen,et al.  Jointly estimating demographics and height with a calibrated camera , 2009, 2009 IEEE 12th International Conference on Computer Vision.