Multi-modal user identification and object recognition surveillance system

We propose an automatic surveillance system for user identification and object recognition based on multi-modal RGB-Depth data analysis. We model a RGBD environment learning a pixel-based background Gaussian distribution. Then, user and object candidate regions are detected and recognized using robust statistical approaches. The system robustly recognizes users and updates the system in an online way, identifying and detecting new actors in the scene. Moreover, segmented objects are described, matched, recognized, and updated online using view-point 3D descriptions, being robust to partial occlusions and local 3D viewpoint rotations. Finally, the system saves the historic of user-object assignments, being specially useful for surveillance scenarios. The system has been evaluated on a novel data set containing different indoor/outdoor scenarios, objects, and users, showing accurate recognition and better performance than standard state-of-the-art approaches.

[1]  Hironobu Fujiyoshi,et al.  Moving target classification and tracking from real-time video , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[2]  Manuele Bicego,et al.  Integrated region- and pixel-based approach to background modelling , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[3]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[4]  H. Niemann,et al.  Adaptive change detection for real-time surveillance applications , 2000, Proceedings Third IEEE International Workshop on Visual Surveillance.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Sergio Escalera,et al.  Spatio-Temporal GrabCut human segmentation for face and pose recovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[7]  Hans-Peter Seidel,et al.  Markerless motion capture of interacting characters using multi-view image segmentation , 2011, CVPR 2011.

[8]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Radu Bogdan Rusu,et al.  Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments , 2010, KI - Künstliche Intelligenz.

[10]  Sharath Pankanti,et al.  Detection and tracking in the IBM PeopleVision system , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[11]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[12]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[13]  Jonathan H. Connell,et al.  A Statistical Approach for Real-time Robust Background Subtrac tion and Shadow Detection , 2014 .

[14]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[15]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  A. Hampapur,et al.  Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking , 2005, IEEE Signal Processing Magazine.

[17]  Aaron F. Bobick,et al.  Recognition of multi-agent interaction in video surveillance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[19]  Timothy F. Cootes,et al.  Comparing Active Shape Models with Active Appearance Models , 1999, BMVC.

[20]  K. P. Karmann,et al.  Moving object recognition using an adaptive background memory , 1990 .

[21]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[22]  Donghoon Kim,et al.  Face Components Detection Using SURF Descriptors and SVMs , 2008, 2008 International Machine Vision and Image Processing Conference.

[23]  Nico Blodow,et al.  CAD-model recognition and 6DOF pose estimation using 3D cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[25]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[26]  Sergio Escalera,et al.  Featureweighting in dynamic timewarping for gesture recognition in depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[27]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[29]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[30]  Yee-Hong Yang,et al.  Stationary background generation: An alternative to the difference of two images , 1990, Pattern Recognit..

[31]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[32]  Kenneth M. Dawson-Howe Active Surveillance Using Dynamic Background Subtraction , 1996 .