Adaptation of a wheel loader automatic bucket filling neural network using reinforcement learning

Bucket-filling is a repetitive task in earth-moving operations with wheel-loaders, which needs to be automated to enable efficient remote control and autonomous operation. Ideally, an automated bucket-filling solution should work for different machine-pile environments, with a minimum of manual retraining. It has been shown that for a given machine-pile environment, a time-delay neural network can efficiently fill the bucket after imitation-based learning from 100 examples by one expert operator. Can such a bucket-filling network be automatically adapted to different machine-pile environments without further imitation learning by optimization of a utility or reward function? This paper investigates the use of a deterministic actor-critic reinforcement learning algorithm for automatic adaptation of a neural network in a new pile environment. The algorithm is used to automatically adapt a bucket-filling network for medium coarse gravel to a cobble-gravel pile environment. The experiments presented are performed with a Volvo L180H wheel-loader in a real-world setting. We found that the bucket-weights in the novel pile environment can improve by five to ten percent within one hour of reinforcement learning with less than 40 bucket-filling trials. This result was obtained after investigating two different reward functions motivated by domain knowledge.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[3]  Ulf Bodin,et al.  From Tele-remote Operation to Semi-automated Wheel-loader , 2018 .

[4]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[5]  Martin Servin,et al.  Computational exploration of robotic rock loading , 2018, Robotics Auton. Syst..

[6]  Johan Larsson,et al.  Admittance Control for Robotic Loading: Design and Experiments with a 1‐Tonne Loader and a 14‐Tonne Load‐Haul‐Dump Machine , 2016, J. Field Robotics.

[7]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[8]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[9]  Hisashi Osumi,et al.  Trajectory arrangement based on resistance force and shape of pile at scooping motion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[10]  Reno Filla,et al.  A study to compare trajectory generation algorithms for automatic bucket filling in wheel loaders , 2014 .

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Anna Gustafson Dependability assurance for automatic load haul dump machines , 2011 .

[13]  Hakan Almqvist Automatic bucket fill , 2009 .

[14]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[15]  Guilherme Maeda,et al.  Learning and Reacting with Inaccurate Prediction: Applications to Autonomous Excavation , 2013 .

[16]  Ulf Bodin,et al.  Machine learning approach to automatic bucket loading , 2016, 2016 24th Mediterranean Conference on Control and Automation (MED).

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Marek Grzes,et al.  Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.

[21]  Ulf Bodin,et al.  Predicting Bucket-Filling Control Actions of a Wheel-Loader Operator Using a Neural Network Ensemble , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[22]  R. Filla,et al.  Towards Finding the Optimal Bucket Filling Strategy through Simulation , 2017 .

[23]  Ulf Bodin,et al.  Field test of neural-network based automatic bucket-filling algorithm for wheel-loaders , 2019, Automation in Construction.

[24]  Luís Torgo,et al.  Regression by Classification , 1996, SBIA.

[25]  Joshua A. Marshall,et al.  Toward Autonomous Excavation of Fragmented Rock: Full-Scale Experiments , 2008, IEEE Transactions on Automation Science and Engineering.

[26]  Paul J. A. Lever,et al.  An automated digging control for a wheel loader , 2001, Robotica.

[27]  P. A. Mikhirev Theory of the working cycle of automated rock-loading machines of periodic action , 1983 .

[28]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[29]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[30]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31]  Fei-Yue Wang,et al.  Agent-based control for fuzzy behavior programming in robotic excavation , 2004, IEEE Transactions on Fuzzy Systems.

[32]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[33]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[34]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[35]  Damien Ernst,et al.  How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.

[36]  Johan Larsson,et al.  Towards Controlling Bucket Fill Factor in Robotic Excavation by Learning Admittance Control Setpoints , 2017, FSR.

[37]  G. DeJong,et al.  Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[38]  Mats Alaküla,et al.  On wheel loader fuel efficiency difference due to operator behaviour distribution , 2012 .

[39]  W. Richardson-Little,et al.  Position accommodation and compliance control for robotic excavation , 2005, Proceedings of 2005 IEEE Conference on Control Applications, 2005. CCA 2005..

[40]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[41]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.