Update Method of Cost Function to Learn Robust Policy Parameters

Because robotic experiments are often expensive in time and/or money, it is a common idea to use simulations instead of real robot experiments to have a robot acquire a motion through reinforcement learning. However, simulation models inevitably have some modeling errors, because of which the solution can be an inappropriate one for the real robot. As a solution to this problem, additional learning processes will be performed using the real robot in many studies, but for some robots and tasks, it will be difficult or infeasible. Therefore, learning methods that can find a robust solution without real robot experiments are desirable. This paper proposes a novel method to update the cost function so that the minimization of the cost will lead to a robust solution, only using simulations. As the method only modifies the cost, the convergence to a solution will not be a problem, unlike the existing method that is based on a similar idea. The validity of the idea is tested by simulations.