DeepDynamicHand: A Deep Neural Architecture for Labeling Hand Manipulation Strategies in Video Sources Exploiting Temporal Information

Humans are capable of complex manipulation interactions with the environment, relying on the intrinsic adaptability and compliance of their hands. Recently, soft robotic manipulation has attempted to reproduce such an extraordinary behavior, through the design of deformable yet robust end-effectors. To this goal, the investigation of human behavior has become crucial to correctly inform technological developments of robotic hands that can successfully exploit environmental constraint as humans actually do. Among the different tools robotics can leverage on to achieve this objective, deep learning has emerged as a promising approach for the study and then the implementation of neuro-scientific observations on the artificial side. However, current approaches tend to neglect the dynamic nature of hand pose recognition problems, limiting the effectiveness of these techniques in identifying sequences of manipulation primitives underpinning action generation, e.g., during purposeful interaction with the environment. In this work, we propose a vision-based supervised Hand Pose Recognition method which, for the first time, takes into account temporal information to identify meaningful sequences of actions in grasping and manipulation tasks. More specifically, we apply Deep Neural Networks to automatically learn features from hand posture images that consist of frames extracted from grasping and manipulation task videos with objects and external environmental constraints. For training purposes, videos are divided into intervals, each associated to a specific action by a human supervisor. The proposed algorithm combines a Convolutional Neural Network to detect the hand within each video frame and a Recurrent Neural Network to predict the hand action in the current frame, while taking into consideration the history of actions performed in the previous frames. Experimental validation has been performed on two datasets of dynamic hand-centric strategies, where subjects regularly interact with objects and environment. Proposed architecture achieved a very good classification accuracy on both datasets, reaching performance up to 94%, and outperforming state of the art techniques. The outcomes of this study can be successfully applied to robotics, e.g., for planning and control of soft anthropomorphic manipulators.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Oswald Lanz,et al.  Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[3]  Manuel G. Catalano,et al.  Simplifying Telerobotics: Wearability and Teleimpedance Improves Human-Robot Interactions in Teleoperation , 2018, IEEE Robotics & Automation Magazine.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[6]  Yi Li,et al.  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[7]  Zhijun Zhang,et al.  A Varying-Parameter Convergent-Differential Neural Network for Solving Joint-Angular-Drift Problems of Redundant Robot Manipulators , 2018, IEEE/ASME Transactions on Mechatronics.

[8]  Oliver Brock,et al.  A novel type of compliant and underactuated robotic hand for dexterous grasping , 2016, Int. J. Robotics Res..

[9]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Shuai Li,et al.  A New Varying-Parameter Convergent-Differential Neural-Network for Solving Time-Varying Convex QP Problem Constrained by Linear-Equality , 2018, IEEE Transactions on Automatic Control.

[11]  Edoardo Battaglia,et al.  A Synergy-Based Optimally Designed Sensing Glove for Functional Grasp Recognition , 2016, Sensors.

[12]  Matteo Bianchi,et al.  Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands. , 2016, Physics of life reviews.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[18]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Hermann Ney,et al.  Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Nikolaos G. Tsagarakis,et al.  Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[22]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  H. Haken,et al.  A theoretical model of phase transitions in human hand movements , 2004, Biological Cybernetics.

[24]  Sanjeev Sofat,et al.  Vision Based Hand Gesture Recognition , 2009 .

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dacheng Tao,et al.  Feature fusion for 3D hand gesture recognition by learning a shared hidden space , 2012, Pattern Recognit. Lett..

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[30]  Ganesh R. Naik,et al.  A Piezoresistive Array Armband With Reduced Number of Sensors for Hand Gesture Recognition , 2020, Frontiers in Neurorobotics.

[31]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[32]  Nicu Sebe,et al.  Deep appearance and motion learning for egocentric activity recognition , 2018, Neurocomputing.

[33]  Stefan Ulbrich,et al.  Master Motor Map (MMM) — Framework and toolkit for capturing, representing, and reproducing human motion on humanoid robots , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[34]  Oliver Brock,et al.  Exploitation of environmental constraints in human and robotic grasping , 2015, Int. J. Robotics Res..

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  Xilong Qu,et al.  Robustness Analysis of a Power-Type Varying-Parameter Recurrent Neural Network for Solving Time-Varying QM and QP Problems and Applications , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Matteo Bianchi,et al.  Recent Data Sets on Object Manipulation: A Survey , 2016, Big Data.

[38]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[39]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Matteo Bianchi,et al.  Synergy-based hand pose sensing: Reconstruction enhancement , 2012, Int. J. Robotics Res..

[41]  Charles A. Klein,et al.  The nature of drift in pseudoinverse control of kinematically redundant manipulators , 1989, IEEE Trans. Robotics Autom..

[42]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[43]  Giuseppe Averta,et al.  A Synergistic Behavior Underpins Human Hand Grasping Force Control During Environmental Constraint Exploitation , 2018 .

[44]  Giuseppe Averta,et al.  Postural Hand Synergies during Environmental Constraint Exploitation , 2017, Front. Neurorobot..

[45]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Oliver Brock,et al.  A compact representation of human single-object grasping , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Hanseok Ko,et al.  Hidden Markov Model on a unit hypersphere space for gesture trajectory recognition , 2014, Pattern Recognit. Lett..

[48]  Stefan Lee,et al.  This Hand Is My Hand: A Probabilistic Approach to Hand Disambiguation in Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[50]  Hanqing Lu,et al.  EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition , 2018, IEEE Transactions on Multimedia.

[51]  Manuel G. Catalano,et al.  Toward Dexterous Manipulation With Augmented Adaptive Synergies: The Pisa/IIT SoftHand 2 , 2018, IEEE Transactions on Robotics.

[52]  Oliver Brock,et al.  A taxonomy of human grasping behavior suitable for transfer to robotic hands , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[54]  Giuseppe Averta,et al.  From humans to robots: The role of cutaneous impairment in human environmental constraint exploitation to inform the design of robotic hands , 2017, 2017 9th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT).

[55]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[56]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[57]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.