RPM-Net

We introduce RPM-Net, a deep learning-based approach which simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel Recurrent Neural Network (RNN), composed of an encoder-decoder pair with interleaved Long Short-Term Memory (LSTM) components, which together predict a temporal sequence of pointwise displacements for the input point cloud. At the same time, the displacements allow the network to learn movable parts, resulting in a motion-based shape segmentation. Recursive applications of RPM-Net on the obtained parts can predict finer-level part motions, resulting in a hierarchical object segmentation. Furthermore, we develop a separate network to estimate part mobilities, e.g., per-part motion parameters, from the segmented motion sequence. Both networks learn deep predictive models from a training set that exemplifies a variety of mobilities for diverse objects. We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities, possibly involving multiple movable parts.

[1]  Xiaogang Wang,et al.  Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Leonidas J. Guibas,et al.  Shape2Pose , 2014, ACM Trans. Graph..

[4]  Yun Jiang,et al.  Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[7]  Ariel Shamir,et al.  Learning to predict part mobility from a single static snapshot , 2017, ACM Trans. Graph..

[8]  Leonidas J. Guibas,et al.  Deep part induction from articulated object pairs , 2018, ACM Trans. Graph..

[9]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[10]  Oliver van Kaick,et al.  Functionality Representations and Applications for Shape Analysis , 2018, Comput. Graph. Forum.

[11]  Ruzena Bajcsy,et al.  Interactive Recognition and Representation of Functionality , 1995, Comput. Vis. Image Underst..

[12]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[13]  Kun Zhou,et al.  Interpreting concept sketches , 2013, ACM Trans. Graph..

[14]  Daniel Cohen-Or,et al.  P2P-NET , 2018, ACM Trans. Graph..

[15]  Chenfanfu Jiang,et al.  Inferring Forces and Learning Human Utilities from Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Cheng Liang,et al.  Mobility‐Trees for Indoor Scenes Manipulation , 2014, Comput. Graph. Forum.

[17]  Wei Xiong,et al.  Learning to Generate Time-Lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  James M. Rehg,et al.  Learning contact locations for pushing and orienting unknown objects , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[20]  Hui Huang,et al.  RPM-Net , 2019, ACM Trans. Graph..

[21]  YoshiyasuYusuke,et al.  Understanding and Exploiting Object Interaction Landscapes , 2017 .

[22]  GongMinglun,et al.  Mobility-Trees for Indoor Scenes Manipulation , 2014 .

[23]  Michael Caine The design of shape interactions using motion constraints , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[24]  Hans-Peter Seidel,et al.  Animation cartography—intrinsic reconstruction of shape and motion , 2012, TOGS.

[25]  Alexander Herzog,et al.  Robot arm pose estimation through pixel-wise part classification , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Pat Hanrahan,et al.  SceneGrok: inferring action maps in 3D environments , 2014, ACM Trans. Graph..

[27]  Hao Li,et al.  Mobility Fitting using 4D RANSAC , 2016, Comput. Graph. Forum.

[28]  Leonidas J. Guibas,et al.  Understanding and Exploiting Object Interaction Landscapes , 2016, ACM Trans. Graph..

[29]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[30]  Salman Khan,et al.  Visual Affordance and Function Understanding , 2018, ACM Comput. Surv..

[31]  Sinisa Todorovic,et al.  A Multi-scale CNN for Affordance Segmentation in RGB Images , 2016, ECCV.

[32]  Wilmot Li,et al.  Illustrating how mechanical assemblies work , 2010, CACM.

[33]  Tamara L. Berg,et al.  Learning Temporal Transformations from Time-Lapse Videos , 2016, ECCV.