论文信息 - Adaptation of a wheel loader automatic bucket filling neural network using reinforcement learning

Adaptation of a wheel loader automatic bucket filling neural network using reinforcement learning

Bucket-filling is a repetitive task in earth-moving operations with wheel-loaders, which needs to be automated to enable efficient remote control and autonomous operation. Ideally, an automated bucket-filling solution should work for different machine-pile environments, with a minimum of manual retraining. It has been shown that for a given machine-pile environment, a time-delay neural network can efficiently fill the bucket after imitation-based learning from 100 examples by one expert operator. Can such a bucket-filling network be automatically adapted to different machine-pile environments without further imitation learning by optimization of a utility or reward function? This paper investigates the use of a deterministic actor-critic reinforcement learning algorithm for automatic adaptation of a neural network in a new pile environment. The algorithm is used to automatically adapt a bucket-filling network for medium coarse gravel to a cobble-gravel pile environment. The experiments presented are performed with a Volvo L180H wheel-loader in a real-world setting. We found that the bucket-weights in the novel pile environment can improve by five to ten percent within one hour of reinforcement learning with less than 40 bucket-filling trials. This result was obtained after investigating two different reward functions motivated by domain knowledge.

Ulf Bodin | Siddharth Dadhich | Ulf Andersson | Torbjörn Martinsson | Fredrik Sandin

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[3] Ulf Bodin,et al. From Tele-remote Operation to Semi-automated Wheel-loader , 2018 .

[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[5] Martin Servin,et al. Computational exploration of robotic rock loading , 2018, Robotics Auton. Syst..

[6] Johan Larsson,et al. Admittance Control for Robotic Loading: Design and Experiments with a 1‐Tonne Loader and a 14‐Tonne Load‐Haul‐Dump Machine , 2016, J. Field Robotics.

[7] Welch Bl. THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[8] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[9] Hisashi Osumi,et al. Trajectory arrangement based on resistance force and shape of pile at scooping motion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[10] Reno Filla,et al. A study to compare trajectory generation algorithms for automatic bucket filling in wheel loaders , 2014 .

[11] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12] Anna Gustafson. Dependability assurance for automatic load haul dump machines , 2011 .

[13] Hakan Almqvist. Automatic bucket fill , 2009 .

[14] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[15] Guilherme Maeda,et al. Learning and Reacting with Inaccurate Prediction: Applications to Autonomous Excavation , 2013 .

[16] Ulf Bodin,et al. Machine learning approach to automatic bucket loading , 2016, 2016 24th Mediterranean Conference on Control and Automation (MED).

[17] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[18] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[19] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[20] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.

[21] Ulf Bodin,et al. Predicting Bucket-Filling Control Actions of a Wheel-Loader Operator Using a Neural Network Ensemble , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[22] R. Filla,et al. Towards Finding the Optimal Bucket Filling Strategy through Simulation , 2017 .

[23] Ulf Bodin,et al. Field test of neural-network based automatic bucket-filling algorithm for wheel-loaders , 2019, Automation in Construction.

[24] Luís Torgo,et al. Regression by Classification , 1996, SBIA.

[25] Joshua A. Marshall,et al. Toward Autonomous Excavation of Fragmented Rock: Full-Scale Experiments , 2008, IEEE Transactions on Automation Science and Engineering.

[26] Paul J. A. Lever,et al. An automated digging control for a wheel loader , 2001, Robotica.

[27] P. A. Mikhirev. Theory of the working cycle of automated rock-loading machines of periodic action , 1983 .

[28] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[29] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[30] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31] Fei-Yue Wang,et al. Agent-based control for fuzzy behavior programming in robotic excavation , 2004, IEEE Transactions on Fuzzy Systems.

[32] Gavriel Salomon,et al. T RANSFER OF LEARNING , 1992 .

[33] David Janz,et al. Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[34] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[35] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.

[36] Johan Larsson,et al. Towards Controlling Bucket Fill Factor in Robotic Excavation by Learning Admittance Control Setpoints , 2017, FSR.

[37] G. DeJong,et al. Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[38] Mats Alaküla,et al. On wheel loader fuel efficiency difference due to operator behaviour distribution , 2012 .

[39] W. Richardson-Little,et al. Position accommodation and compliance control for robotic excavation , 2005, Proceedings of 2005 IEEE Conference on Control Applications, 2005. CCA 2005..

[40] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[41] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.