Hand-Object Contact Force Estimation from Markerless Visual Tracking

We consider the problem of estimating realistic contact forces during manipulation, backed with ground-truth measurements, using vision alone. Interaction forces are usually measured by mounting force transducers onto the manipulated objects or the hands. Those are costly, cumbersome, and alter the objects’ physical properties and their perception by the human sense of touch. Our work establishes that interaction forces can be estimated in a cost-effective, reliable, non-intrusive way using vision. This is a complex and challenging problem. Indeed, in multi-contact, a given motion can generally be caused by an infinity of possible force distributions. To alleviate the limitations of traditional models based on inverse optimization, we collect and release the first large-scale dataset on manipulation kinodynamics as 3.2 hours of synchronized force and motion measurements under 193 object-grasp configurations. We learn a mapping between high-level kinematic features based on the equations of motion and the underlying manipulation forces using recurrent neural networks (RNN). The RNN predictions are consistently refined using physics-based optimization through second-order cone programming (SOCP). We show that our method can successfully capture interaction forces compatible with both the observations and the way humans intuitively manipulate objects, using a single RGB-D camera.

[1]  Eric Brachmann,et al.  6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[2]  Fan Gao,et al.  Internal forces during object manipulation , 2005, Experimental Brain Research.

[3]  M. Arbib Coordinated control programs for movements of the hand , 1985 .

[4]  Dinesh K. Pai,et al.  Interaction capture and synthesis , 2005, ACM Trans. Graph..

[5]  Yiannis Aloimonos,et al.  What can i do around here? Deep functional scene understanding for cognitive robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Yu Zheng,et al.  Evaluation of grasp force efficiency considering hand configuration and using novel generalized penetration distance algorithm , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  Gregor Schöner,et al.  Understanding finger coordination through analysis of the structure of force variability , 2002, Biological Cybernetics.

[8]  Vladimir M. Zatsiorsky,et al.  Grip forces during object manipulation: experiment, mathematical model, and validation , 2011, Experimental Brain Research.

[9]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[11]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Vladimir M. Zatsiorsky,et al.  Optimality versus variability: effect of fatigue in multi-finger redundant tasks , 2011, Experimental Brain Research.

[13]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[14]  Christian Osendorfer,et al.  Computing grip force and torque from finger nail images using Gaussian processes , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[16]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Data, Methods, and Challenges , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Tsuneo Yoshikawa,et al.  Manipulating and grasping forces in manipulation by multifingered robot hands , 1987, IEEE Trans. Robotics Autom..

[22]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Bernard Roth,et al.  Analysis of Multifingered Hands , 1986 .

[24]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[25]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[26]  Karthik Ramani,et al.  DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Deva Ramanan,et al.  Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[29]  Qionghai Dai,et al.  Video-based hand manipulation capture through composite motion control , 2013, ACM Trans. Graph..

[30]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[31]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[32]  Antonis A. Argyros,et al.  Towards force sensing from vision: Observing hand-object interactions to infer manipulation forces , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Vladimir M Zatsiorsky,et al.  Optimization-Based Models of Muscle Coordination , 2002, Exercise and sport sciences reviews.

[34]  C. Schedlinski,et al.  A SURVEY OF CURRENT INERTIA PARAMETER IDENTIFICATION METHODS , 2001 .

[35]  Stephen P. Boyd,et al.  Fast Computation of Optimal Contact Forces , 2007, IEEE Transactions on Robotics.

[36]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[38]  Antonis A. Argyros,et al.  Capturing and Reproducing Hand-Object Interactions Through Vision-Based Force Sensing , 2015 .

[39]  Andrew Gilbert,et al.  Combining discriminative and model based approaches for hand pose estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[40]  Dieter Kraft,et al.  Algorithm 733: TOMP–Fortran modules for optimal control calculations , 1994, TOMS.

[41]  Yi Zhang,et al.  Prediction of Manipulation Actions , 2016, International Journal of Computer Vision.

[42]  DaiQionghai,et al.  Video-based hand manipulation capture through composite motion control , 2013 .

[43]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[44]  Antonis A. Argyros,et al.  Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Stefan Schaal,et al.  Depth-based object tracking using a Robust Gaussian Filter , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Domenico Prattichizzo,et al.  Fingertip force estimation via inertial and magnetic sensors in deformable object manipulation , 2016, 2016 IEEE Haptics Symposium (HAPTICS).

[47]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Stacey L. Gorniak,et al.  Manipulation of a fragile object , 2010, Experimental Brain Research.

[49]  Marc Pollefeys,et al.  Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation , 2015, International Journal of Computer Vision.

[50]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Steven M. Seitz,et al.  Computing the Physical Parameters of Rigid-Body Motion from Video , 2002, ECCV.

[52]  Jinxiang Chai,et al.  Robust realtime physics-based motion control for human grasping , 2013, ACM Trans. Graph..

[53]  Cédric Join,et al.  Numerical differentiation with annihilators in noisy environment , 2009, Numerical Algorithms.

[54]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Kris M. Kitani,et al.  How do we use our hands? Discovering a diverse set of common grasps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Hermann Ney,et al.  Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[58]  Lisa Turner,et al.  Applications of Second Order Cone Programming , 2012 .

[59]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[60]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[61]  S. Gruber,et al.  Robot hands and the mechanics of manipulation , 1987, Proceedings of the IEEE.

[62]  Matei T. Ciocarlie,et al.  Soft Finger Model with Adaptive Contact Geometry for Grasping and Manipulation Tasks , 2007, Second Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems (WHC'07).

[63]  Antonis A. Argyros,et al.  Scalable 3D Tracking of Multiple Interacting Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Antti Oulasvirta,et al.  Real-Time Joint Tracking of a Hand Manipulating an Object from RGB-D Input , 2016, ECCV.

[65]  Xun Niu,et al.  Reconstruction of the unknown optimization cost functions from experimental recordings during static multi-finger prehension. , 2012, Motor control.

[66]  Yoichi Sato,et al.  A scalable approach for understanding the visual structures of hand grasps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[67]  Stephen A. Mascaro,et al.  Estimation of Fingertip Force Direction With Computer Vision , 2009, IEEE Transactions on Robotics.

[68]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[70]  Chenfanfu Jiang,et al.  Inferring Forces and Learning Human Utilities from Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[72]  Giancarlo Canavese,et al.  Flexible Tactile Sensing Based on Piezoresistive Composites: A Review , 2014, Sensors.

[73]  Abderrahmane Kheddar,et al.  Multicontact Interaction Force Sensing From Whole-Body Motion Capture , 2018, IEEE Transactions on Industrial Informatics.

[74]  R. Johansson,et al.  Hand Movements , 2001 .

[75]  Luc Van Gool,et al.  Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Siddhartha S. Srinivasa,et al.  Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set , 2015, IEEE Robotics & Automation Magazine.