Robot Vision System for Real-Time Human Detection and Action Recognition

Mobile robots equipped with camera sensors are required to perceive surrounding humans and their actions for safe autonomous navigation. These are so-called human detection and action recognition. In this paper, moving humans are target objects. Compared to computer vision, the real-time performance of robot vision is more important. For this challenge, we propose a robot vision system. In this system, images described by the optical flow are used as an input. For the classification of humans and actions in the input images, we use Convolutional Neural Network, CNN, rather than coding invariant features. Moreover, we present a novel detector, local search window, for clipping partial images around target objects. Through the experiment, finally, we show that the robot vision system is able to detect the moving human and recognize the action in real time.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Paulo Peixoto,et al.  On Exploration of Classifier Ensemble Synergism in Pedestrian Detection , 2010, IEEE Transactions on Intelligent Transportation Systems.

[3]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[9]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  P. Réfrégier,et al.  Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images. , 2004, Journal of the Optical Society of America. A, Optics, image science, and vision.

[12]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[14]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.