Occlusion-aware multi-view reconstruction of articulated objects for manipulation

We present an algorithm called Procrustes-Lo-RANSAC (PLR) to recover complete 3D models of articulated objects. Structure-from-motion techniques are used to capture 3D point cloud models of an object in two different configurations. Procrustes analysis, combined with a locally optimized RANSAC sampling strategy, facilitates a straightforward geometric approach to recovering the joint axes, as well as classifying them automatically as either revolute or prismatic. With the resulting articulated model, a robotic system is then able to manipulate the object along its joint axes at a specified grasp point in order to exercise its degrees of freedom. Because the models capture all sides of the object, they are occlusion-aware, meaning that the robot has knowledge of parts of the object that are not visible in the current view. Our algorithm does not require prior knowledge of the object, nor does it make any assumptions about the planarity of the object or scene. Experiments with a PUMA 500 robotic arm demonstrate the effectiveness of the approach on a variety of real-world objects containing both revolute and prismatic joints.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Marc Pollefeys,et al.  A Factorization-Based Approach for Articulated Nonrigid Shape, Motion and Kinematic Chain Recovery From Video , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Fumiya Iida,et al.  The challenges ahead for bio-inspired 'soft' robotics , 2012, CACM.

[4]  Nelson L. Max,et al.  Hierarchical Rendering of Trees from Precomputed Multi-Layer Z-Buffers , 1996, Rendering Techniques.

[5]  Takeo Kanade,et al.  A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ian D. Reid,et al.  Articulated structure from motion by factorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Thomas S. Huang,et al.  Motion analysis of articulated objects from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Noah Snavely Photo Tourism : Exploring image collections in 3D , 2006 .

[9]  Gerd Hirzinger,et al.  Optimal Hand-Eye Calibration , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Larry S. Davis,et al.  Recognition and Tracking of 3D Objects by 1D Search , 1993 .

[11]  Dejan Pangercic,et al.  Real-time CAD model matching for mobile manipulation and grasping , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[12]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[14]  Nancy M. Amato,et al.  A Roadmap for US Robotics - From Internet to Robotics 2020 Edition , 2021, Found. Trends Robotics.

[15]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Peter Wallis,et al.  A Robot in the Kitchen , 2010 .

[17]  Narendra Ahuja,et al.  Octree Generation from Object Silhouettes in Perspective Views , 1990, Comput. Vis. Graph. Image Process..

[18]  David A. Forsyth,et al.  Skeletal parameter estimation from optical motion capture data , 2004, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Yaser Sheikh,et al.  3D reconstruction of a smooth articulated trajectory from a monocular image sequence , 2011, 2011 International Conference on Computer Vision.

[20]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[21]  Olivier D. Faugeras,et al.  Variational principles, surface evolution, PDEs, level set methods, and the stereo problem , 1998, IEEE Trans. Image Process..

[22]  Ian D. Walker,et al.  Classification of clothing using interactive perception , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23]  Jean-Philippe Pons,et al.  Towards high-resolution large-scale multi-view stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Roberto Cipolla,et al.  A Probabilistic Framework for Space Carving , 2001, ICCV.

[25]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Thomas Malzbender,et al.  A Survey of Methods for Volumetric Scene Reconstruction from Photographs , 2001, VG.

[27]  Steven M. Seitz,et al.  Complete scene structure from four point correspondences , 1995, Proceedings of IEEE International Conference on Computer Vision.

[28]  Radu Horaud,et al.  Robot Hand-Eye Calibration Using Structure-from-Motion , 2001, Int. J. Robotics Res..

[29]  Roberto Cipolla,et al.  Multi-view stereo via volumetric graph-cuts , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Daniel G. Aliaga,et al.  Building reconstruction using manhattan-world grammars , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Oliver Brock,et al.  Interactive Perception of Articulated Objects , 2010, ISER.

[32]  Francisco Bonin-Font,et al.  Visual Navigation for Mobile Robots: A Survey , 2008, J. Intell. Robotic Syst..

[33]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[34]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[35]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[36]  Wolfram Burgard,et al.  Operating articulated objects based on experience , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Dov Katz Jacqueline Kenney Oliver Brock How Can Robots Succeed in Unstructured Environments ? , 2008 .

[38]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[39]  Michael Potmesil Generating octree models of 3D objects from their silhouettes in a sequence of images , 1987, Comput. Vis. Graph. Image Process..

[40]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  A. M. Petrina Advances in robotics (Review) , 2011, Automatic Documentation and Mathematical Linguistics.

[43]  Lucas Paletta,et al.  Euclidean structure recovery through articulated motion , 1997 .

[44]  Ian D. Walker,et al.  Occlusion-aware reconstruction and manipulation of 3D articulated objects , 2012, 2012 IEEE International Conference on Robotics and Automation.

[45]  Frank Chongwoo Park,et al.  Robot sensor calibration: solving AX=XB on the Euclidean group , 1994, IEEE Trans. Robotics Autom..

[46]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Lourdes Agapito,et al.  Automated articulated structure and 3D shape recovery from point correspondences , 2011, 2011 International Conference on Computer Vision.

[48]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[50]  Ian D. Walker,et al.  Rigid and non-rigid classification using interactive perception , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[52]  Oliver Brock,et al.  Manipulating articulated objects with interactive perception , 2008, 2008 IEEE International Conference on Robotics and Automation.

[53]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[54]  Lourdes Agapito,et al.  Factorization for non-rigid and articulated structure using metric projections , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[56]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[57]  Daniel Cremers,et al.  Continuous Global Optimization in Multiview 3D Reconstruction , 2007, International Journal of Computer Vision.

[58]  Vincent Lepetit,et al.  Fast Non-Rigid Surface Detection, Registration and Realistic Augmentation , 2008, International Journal of Computer Vision.

[59]  Trevor Darrell,et al.  A geometric approach to robotic laundry folding , 2012, Int. J. Robotics Res..

[60]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[61]  Marc Pollefeys,et al.  Multi-view reconstruction using photo-consistency and exact silhouette constraints: a maximum-flow formulation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[62]  Hong Qin,et al.  Shape Reconstruction from 3D and 2D Data Using PDE-Based Deformable Surfaces , 2004, ECCV.

[63]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[64]  Wolfram Burgard,et al.  Vision-based detection for learning articulation models of cabinet doors and drawers in household environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[65]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[66]  Ashutosh Saxena,et al.  Learning to open new doors , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[68]  Ian D. Walker,et al.  Model for unfolding laundry using interactive perception , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[69]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[70]  Richard Szeliski,et al.  Layered depth images , 1998, SIGGRAPH.

[71]  O. Faugeras,et al.  Variational principles, surface evolution, PDE's, level set methods and the stereo problem , 1998, 5th IEEE EMBS International Summer School on Biomedical Imaging, 2002..

[72]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[73]  Michel Drouin,et al.  Automatic observation for 3D reconstruction of unknown objects using visual servoing , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[74]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Larry S. Davis,et al.  3D Surface Reconstruction Using Graph Cuts with Surface Constraints , 2006, ECCV.

[76]  Aaron Hertzmann,et al.  Learning Non-Rigid 3D Shape from 2D Motion , 2003, NIPS.

[77]  Kurt Konolige,et al.  Autonomous door opening and plugging in with a personal robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[78]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[79]  Sean Kean,et al.  Meet the Kinect: An Introduction to Programming Natural User Interfaces , 2011 .

[80]  Marc Pollefeys,et al.  Automatic Kinematic Chain Building from Feature Trajectories of Articulated Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[81]  Mark R. Stevens,et al.  Methods for Volumetric Reconstruction of Visual Scenes , 2004, International Journal of Computer Vision.

[82]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[83]  Wolfram Burgard,et al.  Learning Kinematic Models for Articulated Objects , 2009, IJCAI.

[84]  Lorenzo Torresani,et al.  Tracking and modeling non-rigid objects with rank constraints , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[85]  Jessica K. Hodgins,et al.  Automatic Joint Parameter Estimation from Magnetic Motion Capture Data , 2023, Graphics Interface.

[86]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[87]  Vincent Lepetit,et al.  Accurate Non-Iterative O(n) Solution to the PnP Problem , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[88]  Robert T. Collins,et al.  A space-sweep approach to true multi-image matching , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[89]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[90]  Kostas Daniilidis,et al.  Hand-Eye Calibration Using Dual Quaternions , 1999, Int. J. Robotics Res..

[91]  Gary M. Bone,et al.  Automated modeling and robotic grasping of unknown three-dimensional objects , 2008, 2008 IEEE International Conference on Robotics and Automation.

[92]  Michel Dhome,et al.  Hand-eye calibration , 1997, Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97.

[93]  Lorenzo Torresani,et al.  Space-Time Tracking , 2002, ECCV.

[94]  Edmond Boyer,et al.  Learning temporally consistent rigidities , 2011, CVPR 2011.

[95]  Philip David,et al.  SoftPOSIT: Simultaneous Pose and Correspondence Determination , 2002, International Journal of Computer Vision.

[96]  Thomas Malzbender,et al.  Generalized Voxel Coloring , 1999, Workshop on Vision Algorithms.

[97]  Richard S. Zemel,et al.  Learning Articulated Structure and Motion , 2010, International Journal of Computer Vision.

[98]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[99]  Robert B. Fisher,et al.  Estimating 3-D rigid body transformations: a comparison of four major algorithms , 1997, Machine Vision and Applications.

[100]  Darius Burschka,et al.  Rigid 3D geometry matching for grasping of known objects in cluttered scenes , 2012, Int. J. Robotics Res..

[101]  Michael J. Black,et al.  Contour people: A parameterized model of 2D articulated human shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[102]  J. Andrew Bagnell,et al.  Interactive segmentation, tracking, and kinematic modeling of unknown 3D articulated objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[103]  Olivier D. Faugeras,et al.  Modelling dynamic scenes by registering multi-view image sequences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[104]  Roger Y. Tsai,et al.  A new technique for fully autonomous and efficient 3D robotics hand/eye calibration , 1988, IEEE Trans. Robotics Autom..

[105]  Jing Xiao,et al.  A Closed-Form Solution to Non-Rigid Shape and Motion Recovery , 2004, International Journal of Computer Vision.