Robotic grasping in multi-object stacking scenes based on visual reasoning

Vision is one of the most important ways to solve robotic grasping problem. We propose a framework that can help the robot grasp target object in multi-object scenes based on visual reasoning, which includes two stages: Perception and execution. During percetion stage, our proposed framework includes two parts: Visual manipulation relationship reasoning and robotic grasp detection. In visual manipulation relationship reasoning, Visual Manipulation Relationship Network (VMRN) is proposed to simultainously detect objects and gets manipulation relationships between each pair of objects. We design Object Pairing Pooling Layer to implement end-to-end training of object detection and visual manilupation relationship reasoning in VMRN, which makes the algorithm faster and more robust. In robotic grasp detection, fully convolutional grasp detection network based on oriented anchor box is proposed to implement real-time grasp detection for any object, which is the state-of-the-art robotic grasp detection algorithm on the standard Cornell Grasp Dataset. During execution stage, by combining depth information and perception results, grasp point and grasp vector in camera coordinate are computed first. Then, they are transformed into robot coordinate for robot to execute grasping motion. Experimental results show that our framework can help robot grasp target in multi-object scene in the right order.