Recognition of Human Actions Through Deep Neural Networks for Multimedia Systems Interaction

—Nowadays, interactive multimedia systems are part of everyday life. The most common way to interact and control these devices is through remote controls or some sort of touch panel. In recent years, due to the introduction of reliable low-cost Kinect-like sensing technology, more and more attention has been dedicated to touchless interfaces. A Kinect-like devices can be positioned on top of a multimedia system, detect a person in front of the system and process skeletal data, optionally with RGBd data, to determine user gestures. The gestures of the person can then be used to control, for example, a media device. Even though there is a lot of interest in this area, currently, no consumer system is using this type of interaction probably due to the inherent difficulties in processing raw data coming from Kinect cameras to detect the user intentions. In this work, we considered the use of neural networks using as input only the Kinect skeletal data for the task of user intention classification. We compared different deep networks and analyzed their outputs.

[1]  Yen-Wei Chen,et al.  Kinect-Based Real-Time Gesture Recognition Using Deep Convolutional Neural Networks for Touchless Visualization of Hepatic Anatomical Models in Surgery , 2018, IIMSS.

[2]  Alessio Malizia,et al.  A Touchless Gestural System for Extended Information Access Within a Campus , 2017, SIGUCCS.

[3]  Giovanni Pilato,et al.  Indoor Actions Classification Through Long Short Term Memory Neural Networks , 2017, ICIAP.

[4]  Rhio Sutoyo,et al.  KINECTATION (Kinect for Presentation): Control Presentation with Interactive Board and Record Presentation with Live Capture Tools , 2017 .

[5]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[6]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[7]  Marco La Cascia,et al.  Hankelet-based dynamical systems modeling for 3D action recognition , 2015, Image Vis. Comput..

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  Ignazio Infantino,et al.  Person identification through entropy oriented mean shift clustering of human gaze patterns , 2015, Multimedia Tools and Applications.

[10]  Stefan Wermter,et al.  Real-time gesture recognition using a humanoid robot with a deep neural architecture , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[11]  Davide Carneiro,et al.  A multi-modal approach for activity classification and fall detection , 2014, Int. J. Syst. Sci..

[12]  Diane J. Cook,et al.  Activity recognition on streaming sensor data , 2014, Pervasive Mob. Comput..

[13]  Madjid Maidi,et al.  Interactive media control using natural interaction-based Kinect , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Wei-Yang Lin,et al.  A multimedia presentation system using a 3D gesture interface in museums , 2012, Multimedia Tools and Applications.

[15]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[16]  Hui-mei Justina Hsu The Potential of Kinect in Education , 2011 .

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.