Hand-Raising Gesture Detection in Real Classroom

This paper proposes a novel method for hand-raising detection in the real classroom environment. Different from traditional motion detection, the hand-raising detection is quite challenging in the real classroom due to complex scenarios, various gestures, and low resolutions. To solve these challenges, we first build up a large-scale hand-raising data set from thirty primary schools and middle schools of Shanghai, China. Then we propose an improved R-FCN to solve the above-mentioned challenges. Specifically, we first design an automatic detection templates algorithm for various gestures of hand-raising detection. Second, for better detection of the small-size hands, we present a feature pyramid to simultaneously capture the detail and highly semantic features. Incorporating these two strategies into a basic R-FCN architecture, our model achieves impressive results on real classroom scenarios. After a wide test, the accuracy of the hand-raising detection achieves 85% on average, which can satisfy the real application.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[3]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[7]  Hong Liu,et al.  Detection of hand-raising gestures based on body silhouette analysis , 2009, 2008 IEEE International Conference on Robotics and Biomimetics.

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hassen Drira,et al.  Human Object Interaction Recognition Using Rate-Invariant Shape Analysis of Inter Joint Distances Trajectories , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Varsha Hemant Patil,et al.  A Study of Vision based Human Motion Recognition and Analysis , 2016, Int. J. Ambient Comput. Intell..

[12]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Andrew Hogue,et al.  Recognition of Hand Raising Gestures for a Remote Learning Application , 2007, Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07).

[14]  Bruce H. Thomas,et al.  Data fragment: Virtual reality for viewing and querying large image sets , 2017, 2017 IEEE Virtual Reality (VR).

[15]  Michael R. M. Jenkin,et al.  Recognizing hand-raising gestures using HMM , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[16]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[18]  Dengke Gao,et al.  Haar-Feature Based Gesture Detection of Hand-Raising for Mobile Robot in HRI Environments , 2010 .