Semantically Meaningful View Selection

An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot's performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a “semantic score” from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views.

[1]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[2]  Sotiris Makris,et al.  On a human-robot collaboration in an assembly cell , 2017, Int. J. Comput. Integr. Manuf..

[3]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[4]  Olivier Gibaru,et al.  CNN features are also great at unsupervised classification , 2017, ArXiv.

[5]  Dieter Fox,et al.  Autonomous generation of complete 3D object models using next best view manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[6]  I. Gauthier,et al.  Grasp Representations Depend on Knowledge and Attention , 2017, Journal of experimental psychology. Learning, memory, and cognition.

[7]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[10]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[11]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chen Zhihong,et al.  A vision-based robotic grasping system using deep learning for garbage sorting , 2017, 2017 36th Chinese Control Conference (CCC).

[14]  Manuel Lopes,et al.  Efficient behavior learning in human–robot collaboration , 2018, Auton. Robots.

[15]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Afzal Godil,et al.  A benchmark for best view selection of 3D objects , 2010, 3DOR '10.