Learning to Operate an Excavator via Policy Optimization

Abstract This paper provides a case study for optimizing a deep neural network policy to control an excavator and perform bucket-leveling. The policy mimics human behavior, which traditional control algorithms find difficult because of the unstructured earthmoving environment and excavator system dynamics. The approach in this paper relies on integrating a proprietary simulator, Dynasty, with the OpenAI Gym framework. By exposing the simulation engine in a manner compatible with OpenAI Gym, we benchmarked several reinforcement learning algorithms against an excavator bucket-leveling control problem. The paper provides results for the experiment and discusses techniques to effectively find policies that converge on smooth machine operation.