Progress and Prospects of Multimodal Fusion Methods in Physical Human–Robot Interaction: A Review

Recent advances in physical Human-Robot Interaction (pHRI) have shown the potential and feasibility of robot systems for active and safe collaboration with humans. This induces high demand for real-time perception about the dynamic unity formed by robot, human, environment and operating objects. Single sensory modality is inadequate to give a robust estimate of the state of the unity. This paper provides a comprehensive review of state-of-the-art multi-modal fusion methods exploring their classifications, feasibilities and challenges based on the fact that pHRI is spatio-temporal sharing, multi-modal and multi-task. Finally, several directions of research are recommended based on the discussion of multi-modal fusion methods.

[1]  W. B. Tiest Tactual perception of material properties , 2010, Vision Research.

[2]  Dong-Soo Kwon,et al.  Online touch behavior recognition of hard-cover robot using temporal decision tree classifier , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[3]  Gregory D. Hager,et al.  Tactile-Object Recognition From Appearance Information , 2011, IEEE Transactions on Robotics.

[4]  Tetsuo Ono,et al.  Robovie: an interactive humanoid robot , 2001 .

[5]  Thierry Dutoit,et al.  EEG and Human Locomotion - Descending Commands and Sensory Feedback should be Disentangled from Artifacts Thanks to New Experimental Protocols Position Paper , 2012, BIOSIGNALS.

[6]  Di Guo,et al.  A hybrid deep architecture for robotic grasp detection , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Pietro Falco,et al.  Cross-modal visuo-tactile object recognition using robotic active exploration , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Thi-Lan Le,et al.  Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment , 2017, Comput. Methods Programs Biomed..

[9]  Christopher D. Wickens,et al.  Unmanned Aerial Vehicle Flight Control: False Alarms versus Misses , 2004 .

[10]  Norbert Schmitz,et al.  Survey of Motion Tracking Methods Based on Inertial Sensors: A Focus on Upper Limb Human Motion , 2017, Sensors.

[11]  Josep Amat,et al.  Human-Robot Interaction Based on a Sensitive Bumper Skin , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Alfred Cuschieri,et al.  Dempster-Shafer theory applied in state estimation of a pressure driven endoscope for Hydro-colonoscopy , 2013, Proceedings of the 16th International Conference on Information Fusion.

[13]  Weiming Wang,et al.  Bayesian Grasp: Robotic visual stable grasp based on prior tactile knowledge , 2019, ArXiv.

[14]  Peter K. Allen,et al.  Stable grasping under pose uncertainty using tactile feedback , 2014, Auton. Robots.

[15]  Edward H. Adelson,et al.  Sensing and Recognizing Surface Textures Using a GelSight Sensor , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Li-Chen Fu,et al.  Active Learning on Service Providing Model: Adjustment of Robot Behaviors Through Human Feedback , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[17]  Anca D. Dragan,et al.  Learning Robot Objectives from Physical Human Interaction , 2017, CoRL.

[18]  Andrea Cavallaro,et al.  3D audio-visual speaker tracking with an adaptive particle filter , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Alois Knoll,et al.  Multimodal Human Activity Recognition for Industrial Manufacturing Processes in Robotic Workcells , 2015, ICMI.

[20]  Mustafa Sert,et al.  Multimodal Classification of Obstructive Sleep Apnea Using Feature Level Fusion , 2017, 2017 IEEE 11th International Conference on Semantic Computing (ICSC).

[21]  Chang-Soo Han,et al.  Design of a Ceiling Glass Installation Robot , 2007 .

[22]  Stefano Stramigioli,et al.  Development of a Safety- and Energy-Aware Impedance Controller for Collaborative Robots , 2018, IEEE Robotics and Automation Letters.

[23]  D. Lefeber,et al.  Human-Robot Interaction: Does Robotic Guidance Force Affect Gait-Related Brain Dynamics during Robot-Assisted Treadmill Walking? , 2015, PloS one.

[24]  Roland Siegwart,et al.  A robust and modular multi-sensor fusion approach applied to MAV navigation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Fuchun Sun,et al.  Object recognition using tactile and image information , 2015, 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[26]  Gentiane Venture,et al.  Humans Can Predict Where Their Partner Would Make a Handover , 2018, HRI.

[27]  Toyomi Fujita,et al.  3D Terrain Sensing System Using Laser Range Finder with Arm-Type Movable Unit , 2011 .

[28]  Alan C. Schultz,et al.  Goal tracking in a natural language interface: towards achieving adjustable autonomy , 1999, Proceedings 1999 IEEE International Symposium on Computational Intelligence in Robotics and Automation. CIRA'99 (Cat. No.99EX375).

[29]  Frédéric Lerasle,et al.  Perceiving user's intention-for-interaction: A probabilistic multimodal data fusion scheme , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[30]  Susan J. Lederman,et al.  Extracting object properties through haptic exploration. , 1993, Acta psychologica.

[31]  Robert Riener,et al.  A survey of sensor fusion methods in wearable robotics , 2015, Robotics Auton. Syst..

[32]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[33]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[34]  Francesco Leali,et al.  Survey on human–robot collaboration in industrial settings: Safety, intuitive interfaces and applications , 2018, Mechatronics.

[35]  Ravinder Dahiya,et al.  Robotic Tactile Sensing: Technologies and System , 2012 .

[36]  Robert Harle,et al.  A Survey of Indoor Inertial Positioning Systems for Pedestrians , 2013, IEEE Communications Surveys & Tutorials.

[37]  Gaurav S. Sukhatme,et al.  Force estimation and slip detection/classification for grip control using a biomimetic tactile sensor , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[38]  Andrew Y. Ng,et al.  Integrating Visual and Range Data for Robotic Object Detection , 2008, ECCV 2008.

[39]  Christoph H. Lampert,et al.  Learning Dynamic Tactile Sensing With Robust Vision-Based Training , 2011, IEEE Transactions on Robotics.

[40]  Shen Linyong,et al.  A New Navigation Method for Intelligent Colonoscope , 2007, 2007 IEEE/ICME International Conference on Complex Medical Engineering.

[41]  Hanqing Lu,et al.  Fusing multi-modal features for gesture recognition , 2013, ICMI '13.

[42]  Fuyuan Xiao,et al.  A Novel Evidence Theory and Fuzzy Preference Approach-Based Multi-Sensor Data Fusion Technique for Fault Diagnosis , 2017, Sensors.

[43]  Jun Zhao,et al.  A Multimodal Framework Based on Integration of Cortical and Muscular Activities for Decoding Human Intentions About Lower Limb Motions , 2017, IEEE Transactions on Biomedical Circuits and Systems.

[44]  Guna Seetharaman,et al.  A Probabilistic Fusion Framework for 3-D Reconstruction Using Heterogeneous Sensors , 2017, IEEE Sensors Journal.

[45]  Danfei Xu,et al.  Tactile identification of objects using Bayesian exploration , 2013, 2013 IEEE International Conference on Robotics and Automation.

[46]  Nikolaos G. Tsagarakis,et al.  Adaptation of robot physical behaviour to human fatigue in human-robot co-manipulation , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[47]  Edward H. Adelson,et al.  GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force , 2017, Sensors.

[48]  Martin V. Butz,et al.  Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning , 2016, IROS 2016.

[49]  Anselmo Frizera,et al.  Human-Robot Interaction Strategies for Walker-Assisted Locomotion , 2016 .

[50]  Edward H. Adelson,et al.  Estimating object hardness with a GelSight touch sensor , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[52]  Feng Duan,et al.  Human-robot collaboration in cellular manufacturing: Design and development , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Jiebo Luo,et al.  Multi-modal deep feature learning for RGB-D object detection , 2017, Pattern Recognit..

[54]  Min Tan,et al.  Real-Time Human-Robot Interaction for a Service Robot Based on 3D Human Activity Recognition and Human-Mimicking Decision Mechanism , 2018, 2018 IEEE 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER).

[55]  Jinpeng Mi,et al.  Object affordance based multimodal fusion for natural Human-Robot interaction , 2019, Cognitive Systems Research.

[56]  Jianhua Li,et al.  GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[57]  Fuchun Sun,et al.  A framework for the fusion of visual and tactile modalities for improving robot perception , 2016, Science China Information Sciences.

[58]  Nicolas Y. Masse,et al.  Reach and grasp by people with tetraplegia using a neurally controlled robotic arm , 2012, Nature.

[59]  Paulo Peixoto,et al.  Multimodal vehicle detection: fusing 3D-LIDAR and color camera data , 2017, Pattern Recognit. Lett..

[60]  Jiwen Lu,et al.  Multi-modal uniform deep learning for RGB-D person re-identification , 2017, Pattern Recognit..

[61]  Sung-Bae Cho,et al.  A Hierarchical Bayesian Network for Mixed-Initiative Human-Robot Interaction , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[62]  E. Adelson,et al.  Retrographic sensing for the measurement of surface texture and shape , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  KappassovZhanat,et al.  Tactile sensing in dexterous robot hands - Review , 2015 .

[64]  Antonio Bicchi,et al.  Teaching by demonstration on dual-arm robot using variable stiffness transferring , 2015, 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[65]  M. Maclure The case-crossover design: a method for studying transient effects on the risk of acute events. , 1991, American journal of epidemiology.

[66]  Nikolaos G. Tsagarakis,et al.  Robot adaptation to human physical fatigue in human–robot co-manipulation , 2018, Auton. Robots.

[67]  Xiaoliang Ma,et al.  Analysis of vehicle-bicycle interactions at unsignalized crossings: A probabilistic approach and application. , 2016, Accident; analysis and prevention.

[68]  Thomas B. Sheridan,et al.  Human and Computer Control of Undersea Teleoperators , 1978 .

[69]  Keng Peng Tee,et al.  Continuous Role Adaptation for Human–Robot Shared Control , 2015, IEEE Transactions on Robotics.

[70]  Fillia Makedon,et al.  An Interactive Multisensing Framework for Personalized Human Robot Collaboration and Assistive Training Using Reinforcement Learning , 2017, PETRA.

[71]  Néstor Becerra Yoma,et al.  DNN-HMM based Automatic Speech Recognition for HRI Scenarios , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[72]  Charles C. Kemp,et al.  A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder , 2017, IEEE Robotics and Automation Letters.

[73]  Kaspar Althoefer,et al.  Novel Tactile-SIFT Descriptor for Object Shape Recognition , 2015, IEEE Sensors Journal.

[74]  Sotiris Makris,et al.  Seamless human robot collaborative assembly – An automotive case study , 2018, Mechatronics.

[75]  James M. Conrad,et al.  A survey of multisensor fusion techniques, architectures and methodologies , 2017, SoutheastCon 2017.

[76]  Ravinder Dahiya,et al.  Robotic Tactile Perception of Object Properties: A Review , 2017, ArXiv.

[77]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  J H Blok,et al.  Surface EMG models: properties and applications. , 2000, Journal of electromyography and kinesiology : official journal of the International Society of Electrophysiological Kinesiology.

[79]  Jun Zhang,et al.  Improving robustness of robotic grasping by fusing multi-sensor , 2012, 2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[80]  Jon A. Mukand,et al.  Neuronal ensemble control of prosthetic devices by a human with tetraplegia , 2006, Nature.

[81]  Gerhard Rigoll,et al.  A Multimodal Human-Robot-Interaction Scenario: Working Together with an Industrial Robot , 2009, HCI.

[82]  Gyanendra K. Verma,et al.  Multimodal fusion framework: A multiresolution approach for emotion classification and recognition from physiological signals , 2014, NeuroImage.

[83]  Martin Buss,et al.  Human-Robot Collaboration: a Survey , 2008, Int. J. Humanoid Robotics.

[84]  Wolfram Burgard,et al.  Object identification with tactile sensors using bag-of-features , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[85]  Charles C. Kemp,et al.  Task-centric selection of robot and environment initial configurations for assistive tasks , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[86]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[87]  Jia Chen,et al.  Describing Videos using Multi-modal Fusion , 2016, ACM Multimedia.

[88]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[89]  Antonio Torralba,et al.  Connecting Touch and Vision via Cross-Modal Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Anthony G. Cohn,et al.  ViTac: Feature Sharing Between Vision and Tactile Sensing for Cloth Texture Recognition , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[91]  Véronique Perdereau,et al.  Tactile sensing in dexterous robot hands - Review , 2015, Robotics Auton. Syst..

[92]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[93]  Fuchun Sun,et al.  Efficient Spatio-Temporal Tactile Object Recognition with Randomized Tiling Convolutional Networks in a Hierarchical Fusion Strategy , 2016, AAAI.

[94]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[95]  Yunde Jia,et al.  Audio-visual emotion recognition with boosted coupled HMM , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[96]  Frank Kirchner,et al.  Multimodal sensor-based whole-body control for human-robot collaboration in industrial settings , 2017, Robotics Auton. Syst..

[97]  Yang Gao,et al.  Deep learning for tactile understanding from visual and haptic data , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[98]  Alessandro De Luca,et al.  Collision detection and reaction: A contribution to safe physical Human-Robot Interaction , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[99]  Edward H. Adelson,et al.  Localization and manipulation of small parts using GelSight tactile sensing , 2014, IROS.

[100]  Yifan Zhang,et al.  Multi-modal learning for gesture recognition , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[101]  Min Tan,et al.  Sequential learning for multimodal 3D human activity recognition with Long-Short Term Memory , 2017, 2017 IEEE International Conference on Mechatronics and Automation (ICMA).

[102]  Christian Wolf,et al.  ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[104]  Shuang Wu,et al.  Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Kazuhiro Kosuge,et al.  Progress and prospects of the human–robot collaboration , 2017, Autonomous Robots.

[106]  Aude Billard,et al.  A survey of Tactile Human-Robot Interactions , 2010, Robotics Auton. Syst..

[107]  Andrew Owens,et al.  The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? , 2017, CoRL.

[108]  Jens Wawerla,et al.  Robust sensor fusion for finding HRI partners in a crowd , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[109]  Yi Lin,et al.  Autonomous aerial navigation using monocular visual‐inertial fusion , 2018, J. Field Robotics.

[110]  Danica Kragic,et al.  ST-HMP: Unsupervised Spatio-Temporal feature learning for tactile data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[111]  Alexander Verl,et al.  Cooperation of human and machines in assembly lines , 2009 .

[112]  Sotiris Makris,et al.  Human–robot interaction review and challenges on task planning and programming , 2016, Int. J. Comput. Integr. Manuf..

[113]  Heni Ben Amor,et al.  A system for learning continuous human-robot interactions from human-human demonstrations , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[114]  Nasser Kehtarnavaz,et al.  Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition , 2014, IEEE Sensors Journal.

[115]  H. Zangl,et al.  Combined Capacitive and Ultrasonic Distance Measurement for Automotive Applications , 2011, IEEE Sensors Journal.

[116]  Mark R. Cutkosky,et al.  Force and Tactile Sensing , 2016, Springer Handbook of Robotics, 2nd Ed..

[117]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[118]  Andrew Owens,et al.  Shape-independent hardness estimation using deep learning and a GelSight tactile sensor , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[119]  Alessandro De Luca,et al.  Integrated control for pHRI: Collision avoidance, detection, reaction and collaboration , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[120]  Roberto Merletti,et al.  The extraction of neural strategies from the surface EMG. , 2004, Journal of applied physiology.

[121]  Ricardo Carelli,et al.  Human-machine interfaces based on EMG and EEG applied to robotic systems , 2008, Journal of NeuroEngineering and Rehabilitation.

[122]  Tadej Petric,et al.  Robotic assembly solution by human-in-the-loop teaching method based on real-time stiffness modulation , 2018, Auton. Robots.

[123]  Allison M. Okamura,et al.  Towards a novel man-machine interface to speed up training on robot-assisted surgery , 2019 .

[124]  Yufeng Chen,et al.  Modality-convolutions: Multi-modal gesture recognition based on convolutional neural network , 2017, 2017 12th International Conference on Computer Science and Education (ICCSE).

[125]  Wolfram Burgard,et al.  Choosing smartly: Adaptive multimodal fusion for object detection in changing environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[126]  Lijun Zhao,et al.  An Augmented Discrete-Time Approach for Human-Robot Collaboration , 2016 .

[127]  Swagatam Das,et al.  Multi-sensor data fusion using support vector machine for motor fault detection , 2012, Inf. Sci..

[128]  Gaurav S. Sukhatme,et al.  Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[129]  Darwin G. Caldwell,et al.  Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[130]  Nathan F. Lepora,et al.  Dual-Modal Tactile Perception and Exploration , 2018, IEEE Robotics and Automation Letters.

[131]  Tetsuo Ono,et al.  Development and evaluation of interactive humanoid robots , 2004, Proceedings of the IEEE.

[132]  Nasrollah Moghaddam Charkari,et al.  Multimodal information fusion application to human emotion recognition from face and speech , 2010, Multimedia Tools and Applications.

[133]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[134]  Jean Oh,et al.  Vision-Language Fusion for Object Recognition , 2017, AAAI.

[135]  Fulvio Mastrogiovanni,et al.  Tactile sensing: Steps to artificial somatosensory maps , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[136]  Fulvio Mastrogiovanni,et al.  Special issue on advances in tactile sensing and tactile-based human-robot interaction , 2015, Robotics Auton. Syst..

[137]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[138]  Juan Pablo Wachs,et al.  Early prediction for physical human robot collaboration in the operating room , 2017, Autonomous Robots.