Robotic Object Sorting via Deep Reinforcement Learning: a generalized approach

This work proposes a general formulation for the Object Sorting problem, suitable to describe any non-deterministic environment characterized by friendly and adversarial interference. Such an approach, coupled with a Deep Reinforcement Learning algorithm, allows training policies to solve different sorting tasks without adjusting the architecture or modifying the learning method. Briefly, the environment is subdivided into a clutter, where objects are freely located, and a set of clusters, where objects should be placed according to predefined ordering and classification rules. A 3D grid discretizes such environment: the properties of an object within a cell depict its state. Such attributes include object category and order. A Markov Decision Process formulates the problem: at each time step, the state of the cells fully defines the environment's one. Users can custom-define object classes, ordering priorities, and failure rules. The latter by assigning a non-uniform risk probability to each cell. Performed experiments successfully trained and validated a Deep Reinforcement Learning model to solve several sorting tasks while minimizing the number of moves and failure probability. Obtained results demonstrate the capability of the system to handle non-deterministic events, like failures, and unpredictable external disturbances, like human user interventions.

[1]  Lydia E. Kavraki,et al.  Platform-Independent Benchmarks for Task and Motion Planning , 2018, IEEE Robotics and Automation Letters.

[2]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[3]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Joelle Pineau,et al.  Constrained Markov Decision Processes via Backward Value Functions , 2020, ICML.

[7]  Bojan Jerbić,et al.  A Reinforcement Learning Based Algorithm for Robot Action Planning , 2018, Advances in Service and Industrial Robotics.

[8]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[9]  J. Rosell,et al.  Robot tasks sequence planning using Petri nets , 2003, Proceedings of the IEEE International Symposium onAssembly and Task Planning, 2003..

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[12]  Lucian Busoniu,et al.  Sorting objects from a conveyor belt using active perception with a POMDP model , 2019, 2019 18th European Control Conference (ECC).

[13]  E. Altman Constrained Markov Decision Processes , 1999 .

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[16]  Carme Torras,et al.  POMDP approach to robotized clothes separation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[18]  Stefano Ghidoni,et al.  Robot Task Planning via Deep Reinforcement Learning: a Tabletop Object Sorting Application , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).