Teaching Dual-arm Robots to Do Object Assembly using Multi-modal 3D Vision

The motivation of this paper is to develop a smart system using multi-modal vision for next-generation mechanical assembly. It includes two phases where in the first phase human beings teach the assembly structure to a robot and in the second phase the robot finds objects and grasps and assembles them using AI planning. The crucial part of the system is the precision of 3D visual detection and the paper presents multi-modal approaches to meet the requirements: AR markers are used in the teaching phase since human beings can actively control the process. Point cloud matching and geometric constraints are used in the robot execution phase to avoid unexpected noises. Experiments are performed to examine the precision and correctness of the approaches. The study is practical: The developed approaches are integrated with graph model-based motion planning, implemented on an industrial robots and applicable to real-world scenarios.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Mehmet Remzi Dogar,et al.  Multi-robot grasp planning for sequential assembly operations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Gordon Cheng,et al.  Transferring skills to humanoid robots by extracting semantic representations from observations of human activities , 2017, Artif. Intell..

[4]  Adrian David Cheok,et al.  Online 6 DOF augmented reality registration from natural features , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[5]  Thierry Siméon,et al.  Transition-based RRT for path planning in continuous cost spaces , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Avinash C. Kak,et al.  Calculating the 3d-pose of rigid-objects using active appearance models , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[7]  Nico Blodow,et al.  Combined 2D–3D categorization and classification for multimodal perception systems , 2011, Int. J. Robotics Res..

[8]  Kensuke Harada,et al.  Developing and Comparing Single-Arm and Dual-Arm Regrasp , 2015, IEEE Robotics and Automation Letters.

[9]  Bojan Jerbić,et al.  Object Tracking with a Multiagent Robot System and a Stereo Vision Camera , 2014 .

[10]  Patrick A. O'Donnell,et al.  HANDEY: A Robot Task Planner , 1992 .

[11]  Tom Drummond,et al.  Going out: robust model-based tracking for outdoor augmented reality , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[12]  Gerd Hirzinger,et al.  Registration of CAD-models to images by iterative inverse perspective matching , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[13]  Chandra Kambhamettu,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, CVPR.

[14]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[15]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[16]  Akira Nakamura,et al.  Probabilistic approach for object bin picking approximated by cylinders , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17]  Jia Pan,et al.  Multi-contour initial pose estimation for 3D registration , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Horst Bischof,et al.  Online Model-Based Multi-Scale Pose Estimation , 2012 .

[19]  Matthew T. Mason,et al.  Mechanics of Robotic Manipulation , 2001 .

[20]  Zoltan-Csaba Marton,et al.  Tutorial: Point Cloud Library: Three-Dimensional Object Recognition and 6 DOF Pose Estimation , 2012, IEEE Robotics & Automation Magazine.

[21]  Philip David,et al.  SoftPOSIT: Simultaneous Pose and Correspondence Determination , 2002, International Journal of Computer Vision.

[22]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[23]  José García Rodríguez,et al.  Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC , 2015, Appl. Soft Comput..

[24]  Peter K. Allen,et al.  Data-driven grasping , 2011, Auton. Robots.

[25]  Venkataraman Sundareswaran,et al.  Visual servoing-based augmented reality , 1999 .

[26]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  James M. Rehg,et al.  Perceiving clutter and surfaces for object placement in indoor environments , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[28]  Nico Blodow,et al.  CAD-model recognition and 6DOF pose estimation using 3D cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[29]  Hongbin Zha,et al.  Predictive model for path planning by using k-near dynamic bridge builder and Inner Parzen Window , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Manuela M. Veloso,et al.  Detection and Localization of Multiple Objects , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[31]  Hirokazu Kato,et al.  Marker tracking and HMD calibration for a video-based augmented reality conferencing system , 1999, Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR'99).

[32]  Seung-kook Yun,et al.  Compliant manipulation for peg-in-hole: Is passive compliance a key to learn contact motion? , 2008, 2008 IEEE International Conference on Robotics and Automation.

[33]  Markus Vincze,et al.  Ensemble of shape functions for 3D object classification , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[34]  Y. Maeda,et al.  Motion planning for 3D multifingered caging with object recognition using AR picture markers , 2012, 2012 IEEE International Conference on Mechatronics and Automation.

[35]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Lindsay Kleeman,et al.  Fusion of multimodal visual cues for model-based object tracking , 2003 .

[38]  Dieter Schmalstieg,et al.  ARToolKitPlus for Pose Trackin on Mobile Devices , 2007 .

[39]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[40]  Ramón López de Mántaras,et al.  Evaluation of Three Vision Based Object Perception Methods for a Mobile Robot , 2011, J. Intell. Robotic Syst..

[41]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Lucian Cosmin Goron,et al.  Robustly Segmenting Cylindrical and Box-like Objects in Cluttered Scenes using Depth Cameras , 2012, ROBOTIK.

[43]  Gregory D. Hager,et al.  Fast and Globally Convergent Pose Estimation from Video Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Mark Fiala,et al.  ARTag, a fiducial marker system using digital techniques , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Gert Kootstra,et al.  International Conference on Robotics and Automation (ICRA) , 2008, ICRA 2008.

[46]  Heinz Huegli,et al.  Augmented reality using range images , 1997, Electronic Imaging.

[47]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[48]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[49]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[50]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[51]  Akira Nakamura,et al.  Modeling of everyday objects for semantic grasp , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[52]  Michael Goesele,et al.  Back to the Future: Learning Shape Models from 3D CAD Data , 2010, BMVC.

[53]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[54]  Cordelia Schmid,et al.  Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[56]  Éric Marchand,et al.  Virtual Visual Servoing: a framework for real‐time augmented reality , 2002, Comput. Graph. Forum.

[57]  Philip David,et al.  Simultaneous pose and correspondence determination using line features , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[58]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[59]  Dejan Pangercic,et al.  Fast and Robust Object Detection in Household Environments Using Vocabulary Trees with SIFT Descriptors , 2011, IROS 2011.

[60]  Kazuhiko Sumi,et al.  Fast graspability evaluation on single depth maps for bin picking with general grippers , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Tom Drummond,et al.  Robust visual tracking for non-instrumental augmented reality , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..