论文信息 - Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

The presentation and analysis of image data from a single viewpoint are often not sufficient to solve a task. Several viewpoints are necessary to obtain more information. The nextbest-view problem attempts to find the optimal viewpoint with the greatest information gain for the underlying task. In this work, a robot arm holds an object in its end-effector and searches for a sequence of next-best-view to explicitly identify the object. We use Soft Actor-Critic (SAC), a method of deep reinforcement learning, to learn these next-best-views for a specific set of objects. The evaluation shows that an agent can learn to determine an object pose to which the robot arm should move an object. This leads to a viewpoint that provides a more accurate prediction to distinguish such an object from other objects better. We make the code publicly available for the scientific community and for reproducibility.

[1] Subhransu Maji,et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Joel Casimiro,et al. Next-Best View Policy for 3D Reconstruction , 2020, ECCV Workshops.

[3] Mongi A. Abidi,et al. Best-next-view algorithm for three-dimensional scene reconstruction using range images , 1995, Other Conferences.

[4] Sven J. Dickinson,et al. A Computational Model of View Degeneracy , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Ziyan Wu,et al. Matching RGB Images to CAD Models for Object Pose Estimation , 2018, ArXiv.

[6] Ruzena Bajcsy,et al. Occlusions as a Guide for Planning the Next View , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[8] Dieter Fox,et al. Autonomous generation of complete 3D object models using next best view manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9] John K. Tsotsos,et al. Revisiting active perception , 2016, Autonomous Robots.

[10] John K. Tsotsos,et al. A Computational Learning Theory of Active Object Recognition Under Uncertainty , 2012, International Journal of Computer Vision.

[11] R. Shepard,et al. Mental Rotation of Three-Dimensional Objects , 1971, Science.

[12] Michael C. Pyryt. Human cognitive abilities: A survey of factor analytic studies , 1998 .

[13] John K. Tsotsos. The Complexity of Perceptual Search Tasks , 1989, IJCAI.

[14] Simone Frintrop,et al. Explore, Approach, and Terminate: Evaluating Subtasks in Active Visual Object Search Based on Deep Reinforcement Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[16] Mark E. Campbell,et al. An Adaptable, Probabilistic, Next-Best View Algorithm for Reconstruction of Unknown 3-D Objects , 2017, IEEE Robotics and Automation Letters.

[17] Asako Kanezaki,et al. RotationNet: Learning Object Classification Using Unsupervised Viewpoint Estimation , 2016, ArXiv.

[18] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[19] Richard Szeliski,et al. Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[20] Luis Enrique Sucar,et al. Supervised Learning of the Next-Best-View for 3D Object Reconstruction , 2019, Pattern Recognit. Lett..

[21] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[22] R. Bajcsy. Active perception , 1988, Proc. IEEE.

[23] Richard Pito,et al. A sensor-based solution to the "next best view" problem , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[24] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[26] Ruzena Bajcsy,et al. Solution to the next best view problem for automated CAD model acquisiton of free-form objects using range cameras , 1995, Optics East.

[27] James Bergstra,et al. Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[28] John K. Tsotsos,et al. Blocks World Revisited: The Effect of Self-Occlusion on Classification by Convolutional Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[29] Tae-Kyun Kim,et al. Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[31] John K. Tsotsos,et al. Active object recognition , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.