Towards End-to-End Gesture Recognition with Recurrent Neural Networks

With the development of smart devices, gesture recognition is used in more and more fields. The current gesture recognition devices on the market are inconvenient and expensive. Human motion analysis and recognition based on attitude sensor is a new field. The algorithm based on the recurrent neural network takes into account the timing information of the actions and can better resolve the uncertainty of the human motion in time, but as the training sample increases, the efficiency becomes lower. This paper proposes an action recognition method based on Connectionist temporal classification for sequence learning. This method realizes end-to-end recognition of gestures.

[1]  L. Benini,et al.  Activity recognition from on-body sensors by classifier fusion: sensor scalability and robustness , 2007, 2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information.

[2]  Yingmin Jia,et al.  Robust control with decoupling performance for steering and traction of 4WS vehicles under velocity-varying motion , 2000, IEEE Trans. Control. Syst. Technol..

[3]  Yingmin Jia,et al.  Alternative proofs for improved LMI representations for the analysis and the design of continuous-time systems with polytopic type uncertainty: a predictive approach , 2003, IEEE Trans. Autom. Control..

[4]  Mubarak Shah,et al.  Visual gesture recognition , 1994 .

[5]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[8]  Kaisa Väänänen,et al.  Gesture Driven Interaction as a Human Factor in Virtual Environments - An Approach with Neural Networks , 1993, Virtual Reality Systems.

[9]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Hideki Nakayama,et al.  Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network , 2015, PSIVT.

[11]  Christoph Maggioni,et al.  A novel gestural input device for virtual reality , 1993, Proceedings of IEEE Virtual Reality Annual International Symposium.

[12]  Seongil Lee,et al.  Enabling a gesture-based numeric input on mobile phones , 2011, 2011 IEEE International Conference on Consumer Electronics (ICCE).

[13]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Johan Schalkwyk,et al.  Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).