Enabling Early Gesture Recognition by Motion Augmentation

In real-time gesture recognition algorithms, accurately classifying gestures early, when they are only partially observed, can be advantageous as it minimizes latency and improves user experience. This work investigates a novel approach for improving the results of an early gesture classification model. The method involves augmenting the input sequence of human poses of a partially observed gesture with a series of poses predicted by an auxiliary recurrent neural network sequence-to-sequence motion prediction model before being fed into a random forest gesture classifier. By concatenating the partially observed ground truth sequence with the forecasted motion sequence, we are able to significantly improve early gesture recognition accuracy. When forecasting 25 future frames of a partially observed input gesture sequence of 50 frames, recognition accuracy improves from 45% to 87% on average when evaluated on the MSRC-12 gesture dataset.

[1]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[4]  Stan Sclaroff,et al.  Learning Activity Progression in LSTMs for Activity Detection and Early Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[6]  Bharti Bansal,et al.  Gesture Recognition: A Survey , 2016 .

[7]  Margrit Betke,et al.  A random forest approach to segmenting and classifying gestures , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[8]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[9]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Michael G. Strintzis,et al.  A gesture recognition system using 3D data , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[11]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[13]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[14]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[15]  Yale Song,et al.  Multi-signal gesture recognition using temporal smoothing hidden conditional random fields , 2011, Face and Gesture 2011.

[16]  Ryo Kurazume,et al.  Early Recognition and Prediction of Gestures , 2006, 18th International Conference on Pattern Recognition (ICPR'06).