Fast object localization and pose estimation in heavy clutter for robotic bin picking

We present a practical vision-based robotic bin-picking system that performs detection and three-dimensional pose estimation of objects in an unstructured bin using a novel camera design, picks up parts from the bin, and performs error detection and pose correction while the part is in the gripper. Two main innovations enable our system to achieve real-time robust and accurate operation. First, we use a multi-flash camera that extracts robust depth edges. Second, we introduce an efficient shape-matching algorithm called fast directional chamfer matching (FDCM), which is used to reliably detect objects and estimate their poses. FDCM improves the accuracy of chamfer matching by including edge orientation. It also achieves massive improvements in matching speed using line-segment approximations of edges, a three-dimensional distance transform, and directional integral images. We empirically show that these speedups, combined with the use of bounds in the spatial and hypothesis domains, give the algorithm sublinear computational complexity. We also apply our FDCM method to other applications in the context of deformable and articulated shape matching. In addition to significantly improving upon the accuracy of previous chamfer matching methods in all of the evaluated applications, FDCM is up to two orders of magnitude faster than the previous methods.

[1]  Yoshiaki Shirai,et al.  Guiding a robot by visual feedback in assembling tasks , 1973, Pattern Recognit..

[2]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[3]  W. Grimson,et al.  Model-Based Recognition and Localization from Sparse Range or Tactile Data , 1984 .

[4]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[6]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[7]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[8]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Gérard G. Medioni,et al.  Structural Indexing: Efficient 3-D Object Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Pradeep K. Khosla,et al.  Robotic visual servoing and robotic assembly tasks , 1996, IEEE Robotics Autom. Mag..

[11]  Clark F. Olson,et al.  Automatic target recognition by matching oriented edge pixels , 1997, IEEE Trans. Image Process..

[12]  Dariu Gavrila,et al.  Multi-feature hierarchical template matching using distance transforms , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[13]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[16]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[17]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Björn Stenger,et al.  Shape context and chamfer matching in cluttered scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Ronen Basri,et al.  Lambertian Reflectance and Linear Subspaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  Ramesh Raskar,et al.  Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging , 2004, SIGGRAPH 2004.

[23]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[25]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[28]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[29]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[30]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Joshua D. Schwartz,et al.  Hierarchical Matching of Deformable Shapes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Andrew Blake,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[42]  Anurag Mittal,et al.  Multi-stage Contour Based Detection of Deformable Objects , 2008, ECCV.

[43]  Jianbo Shi,et al.  Contour Context Selection for Object Detection: A Set-to-Set Contour Matching Approach , 2008, ECCV.

[44]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[45]  H. Bischof,et al.  Fast human detection in crowded scenes by contour integration and local shape estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Stefan Carlsson,et al.  Automatic learning and extraction of multi-local features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Ko Nishino,et al.  Scale-hierarchical 3D object recognition in cluttered scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Ramesh Raskar,et al.  Vision-guided Robot System for Picking Objects by Casting Shadows , 2010, Int. J. Robotics Res..

[49]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Rama Chellappa,et al.  Fast directional chamfer matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Rama Chellappa,et al.  Pose estimation in heavy clutter using a multi-flash camera , 2010, 2010 IEEE International Conference on Robotics and Automation.

[52]  Sebastian Thrun,et al.  3D shape scanning with a time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[54]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[55]  Daniel P. Huttenlocher,et al.  Distance Transforms of Sampled Functions , 2012, Theory Comput..

[56]  A. Monks Objects , 2013 .