Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

The presentation and analysis of image data from a single viewpoint are often not sufficient to solve a task. Several viewpoints are necessary to obtain more information. The nextbest-view problem attempts to find the optimal viewpoint with the greatest information gain for the underlying task. In this work, a robot arm holds an object in its end-effector and searches for a sequence of next-best-view to explicitly identify the object. We use Soft Actor-Critic (SAC), a method of deep reinforcement learning, to learn these next-best-views for a specific set of objects. The evaluation shows that an agent can learn to determine an object pose to which the robot arm should move an object. This leads to a viewpoint that provides a more accurate prediction to distinguish such an object from other objects better. We make the code publicly available for the scientific community and for reproducibility.

[1]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Joel Casimiro,et al.  Next-Best View Policy for 3D Reconstruction , 2020, ECCV Workshops.

[3]  Mongi A. Abidi,et al.  Best-next-view algorithm for three-dimensional scene reconstruction using range images , 1995, Other Conferences.

[4]  Sven J. Dickinson,et al.  A Computational Model of View Degeneracy , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ziyan Wu,et al.  Matching RGB Images to CAD Models for Object Pose Estimation , 2018, ArXiv.

[6]  Ruzena Bajcsy,et al.  Occlusions as a Guide for Planning the Next View , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[8]  Dieter Fox,et al.  Autonomous generation of complete 3D object models using next best view manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  John K. Tsotsos,et al.  Revisiting active perception , 2016, Autonomous Robots.

[10]  John K. Tsotsos,et al.  A Computational Learning Theory of Active Object Recognition Under Uncertainty , 2012, International Journal of Computer Vision.

[11]  R. Shepard,et al.  Mental Rotation of Three-Dimensional Objects , 1971, Science.

[12]  Michael C. Pyryt Human cognitive abilities: A survey of factor analytic studies , 1998 .

[13]  John K. Tsotsos The Complexity of Perceptual Search Tasks , 1989, IJCAI.

[14]  Simone Frintrop,et al.  Explore, Approach, and Terminate: Evaluating Subtasks in Active Visual Object Search Based on Deep Reinforcement Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[16]  Mark E. Campbell,et al.  An Adaptable, Probabilistic, Next-Best View Algorithm for Reconstruction of Unknown 3-D Objects , 2017, IEEE Robotics and Automation Letters.

[17]  Asako Kanezaki,et al.  RotationNet: Learning Object Classification Using Unsupervised Viewpoint Estimation , 2016, ArXiv.

[18]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[19]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[20]  Luis Enrique Sucar,et al.  Supervised Learning of the Next-Best-View for 3D Object Reconstruction , 2019, Pattern Recognit. Lett..

[21]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[22]  R. Bajcsy Active perception , 1988, Proc. IEEE.

[23]  Richard Pito,et al.  A sensor-based solution to the "next best view" problem , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[26]  Ruzena Bajcsy,et al.  Solution to the next best view problem for automated CAD model acquisiton of free-form objects using range cameras , 1995, Optics East.

[27]  James Bergstra,et al.  Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[28]  John K. Tsotsos,et al.  Blocks World Revisited: The Effect of Self-Occlusion on Classification by Convolutional Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[29]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[31]  John K. Tsotsos,et al.  Active object recognition , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.