Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

For lower arm amputees, robotic prosthetic hands offer the promise to regain the capability to perform fine object manipulation in activities of daily living. Accurate inference of the human’s intended gesture to control a robotic prosthetic hand is vital to the efficacy of the solution. Current control methods based on physiological signals such as electroencephalography (EEG) and electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, variability of skin electrode junction impedance over time, muscle fatigue, and other factors. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, variable shapes of objects depending on view-angle, among other factors. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time. Specifically, results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually. An overall fusion accuracy of 95.3% among 13 labels (compared to a chance level of 7.7%) is achieved, and more detailed analysis indicate that the correct grasp is inferred sufficiently early and with high confidence compared to the top contender, in order to allow successful robot actuation to close the loop.

[1]  N. Hogan,et al.  Probability density of the surface electromyogram and its relation to amplitude detectors , 1999, IEEE Transactions on Biomedical Engineering.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kathryn Ziegler-Graham,et al.  Estimating the prevalence of limb loss in the United States: 2005 to 2050. , 2008, Archives of physical medicine and rehabilitation.

[4]  Jana Kosecka,et al.  Synthesizing Training Data for Object Detection in Indoor Scenes , 2017, Robotics: Science and Systems.

[5]  S Micera,et al.  Control of Hand Prostheses Using Peripheral Information , 2010, IEEE Reviews in Biomedical Engineering.

[6]  Ji-Hoon Jeong,et al.  A Novel Approach to Classify Natural Grasp Actions by Estimating Muscle Activity Patterns from EEG Signals , 2020, 2020 8th International Winter Conference on Brain-Computer Interface (BCI).

[7]  Robert W. Mann,et al.  Myoelectric Signal Processing: Optimal Estimation Applied to Electromyography - Part I: Derivation of the Optimal Myoprocessor , 1980, IEEE Transactions on Biomedical Engineering.

[8]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[9]  Klaus-Robert Müller,et al.  Real-time robustness evaluation of regression based myoelectric control against arm position change and donning/doffing , 2017, PloS one.

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Danica Kragic,et al.  The GRASP Taxonomy of Human Grasp Types , 2016, IEEE Transactions on Human-Machine Systems.

[12]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[13]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[14]  Deniz Erdogmus,et al.  Muscle Synergy-based Grasp Classification for Robotic Hand Prosthetics , 2017, PETRA.

[15]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[16]  Stephen P. Boyd,et al.  Greedy Gaussian segmentation of multivariate time series , 2016, Advances in Data Analysis and Classification.

[17]  Patrick van der Smagt,et al.  Learning EMG control of a robotic hand: towards active prostheses , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[18]  Pornchai Phukpattaranont,et al.  Feature reduction and selection for EMG signal classification , 2012, Expert Syst. Appl..

[19]  Linda Resnik,et al.  The DEKA Arm: Its features, functionality, and evolution during the Veterans Affairs Study to optimize the DEKA Arm , 2014, Prosthetics and orthotics international.

[20]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[21]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[22]  Quoc V. Le,et al.  Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  C. Jang,et al.  A Survey on Activities of Daily Living and Occupations of Upper Extremity Amputees , 2011, Annals of rehabilitation medicine.

[25]  M. Jeannerod The timing of natural prehension movements. , 1984, Journal of motor behavior.

[26]  Aljoscha Smolic,et al.  Egocentric Gesture Recognition for Head-Mounted AR Devices , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Arto Visala,et al.  urrent state of digital signal processing in myoelectric interfaces and elated applications , 2015 .