Inferring Forces and Learning Human Utilities from Videos

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the concept of human utilities, which in our opinion provides a deeper and finer-grained account not only of object affordance but also of people's interaction with objects. Rather than defining affordance in terms of the geometric compatibility between body poses and 3D objects, we devise algorithms that employ physicsbased simulation to infer the relevant forces/pressures acting on body parts. By observing the choices people make in videos (particularly in selecting a chair in which to sit) our system learns the comfort intervals of the forces exerted on body parts (while sitting). We account for people's preferences in terms of human utilities, which transcend comfort intervals to account also for meaningful tasks within scenes and spatiotemporal constraints in motion planning, such as for the purposes of robot task planning.

[1]  Katsushi Ikeuchi,et al.  Scene Understanding by Reasoning Stability and Safety , 2015, International Journal of Computer Vision.

[2]  Demetri Terzopoulos,et al.  Artificial fishes: physics, locomotion, perception, behavior , 1994, SIGGRAPH.

[3]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[4]  Antonis A. Argyros,et al.  Towards force sensing from vision: Observing hand-object interactions to infer manipulation forces , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[6]  Marco Attene,et al.  Hierarchical mesh segmentation based on fitting primitives , 2006, The Visual Computer.

[7]  James M. Rehg,et al.  Affordance Prediction via Learned Object Attributes , 2011 .

[8]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[9]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[10]  Katsushi Ikeuchi,et al.  Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[12]  Peter Hedström,et al.  1 Rational Choice and Sociology , 2006 .

[13]  Jinxiang Chai,et al.  Robust realtime physics-based motion control for human grasping , 2013, ACM Trans. Graph..

[14]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[15]  Ronald Fedkiw,et al.  A Crystalline, Red Green Strategy for Meshing Highly Deformable Objects with Tetrahedra , 2003, IMR.

[16]  Eftychios Sifakis,et al.  Realistic Biomechanical Simulation and Control of Human Swimming , 2014, ACM Trans. Graph..

[17]  Tsuhan Chen,et al.  3D-Based Reasoning with Blocks, Support, and Stability , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yun Jiang,et al.  Hallucinating Humans for Learning Robotic Placement of Objects , 2012, ISER.

[19]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Vincent De Sapio,et al.  Robotics-based synthesis of human motion , 2009, Journal of Physiology-Paris.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[24]  Ronald Fedkiw,et al.  Invertible finite elements for robust simulation of large deformation , 2004, SCA '04.

[25]  Katsushi Ikeuchi,et al.  Detecting potential falling objects by inferring human action and natural disturbance , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  A. Denis Rationality , 2012, Encyclopedia of Evolutionary Psychological Science.

[27]  Taesoo Kwon,et al.  Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[28]  Craig Schroeder,et al.  Optimization Integrator for Large Time Steps , 2014, IEEE Transactions on Visualization and Computer Graphics.

[29]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[30]  三嶋 博之 The theory of affordances , 2008 .

[31]  Eftychios Sifakis,et al.  Comprehensive biomechanical modeling and simulation of the upper body , 2009, TOGS.

[32]  Vladlen Koltun,et al.  Elastic Fragments for Dense Scene Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Qionghai Dai,et al.  Video-based hand manipulation capture through composite motion control , 2013, ACM Trans. Graph..

[34]  David J. Fleet,et al.  The Kneed Walker for human pose tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Gavin S. P. Miller,et al.  The motion dynamics of snakes and worms , 1988, SIGGRAPH.

[36]  Chenfanfu Jiang,et al.  A level set method for ductile fracture , 2013, SCA '13.

[37]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[38]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[39]  Pat Hanrahan,et al.  SceneGrok: inferring action maps in 3D environments , 2014, ACM Trans. Graph..

[40]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[41]  John C. Platt,et al.  Elastically deformable models , 1987, SIGGRAPH.

[42]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[44]  Ronald Fedkiw,et al.  Finite volume methods for the simulation of skeletal muscle , 2003, SCA '03.

[45]  Nanning Zheng,et al.  Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[47]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[48]  Chenfanfu Jiang,et al.  Augmented MPM for phase-change and varied materials , 2014, ACM Trans. Graph..

[49]  Robert Bridson,et al.  Fast Poisson disk sampling in arbitrary dimensions , 2007, SIGGRAPH '07.

[50]  Alexey Stomakhin,et al.  A material point method for snow simulation , 2013, ACM Trans. Graph..

[51]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[52]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[53]  Lucas Paletta,et al.  Learning Predictive Features in Affordance based Robotic Perception Systems , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[55]  Yun Jiang,et al.  Infinite Latent Conditional Random Fields for Modeling Environments through Humans , 2013, Robotics: Science and Systems.

[56]  Alexey Stomakhin,et al.  Energetically consistent invertible elasticity , 2012, SCA '12.

[57]  Demetri Terzopoulos,et al.  Deformable models , 2000, The Visual Computer.

[58]  Daniel Diermeier Rational choice and the role of theory in political science , 1995 .

[59]  Leonidas J. Guibas,et al.  Shape2Pose , 2014, ACM Trans. Graph..

[60]  Yun Jiang,et al.  Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[62]  Wei Liang,et al.  Evaluating Human Cognition of Containing Relations with Physical Simulation , 2015, CogSci.

[63]  Mark H. Overmars,et al.  A Comparative Study of Probabilistic Roadmap Planners , 2002, WAFR.

[64]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[65]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[66]  Ladislav Kavan,et al.  Computational bodybuilding , 2015, ACM Trans. Graph..

[67]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[68]  Sylvain Calino,et al.  Robot programming by demonstration : a probabilistic approach , 2009 .

[69]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[70]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[71]  Andrew Selle,et al.  Efficient elasticity for character skinning with contact and collisions , 2011, SIGGRAPH 2011.

[72]  K. Arrow,et al.  The New Palgrave Dictionary of Economics , 2020 .

[73]  David R. Hill,et al.  OpenVDB: an open-source data structure and toolkit for high-resolution volumes , 2013, SIGGRAPH '13.

[74]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.