Learning a model-free robotic continuous state-action task through contractive Q-network

In Reinforcement Learning (RL) working in high dimensional continuous state-action spaces is a challenging issue. Q-learning can be used for this purpose. Neural network is chosen as Function Approximator (FA) for actor and critic in the algorithm. Learning in this context requires many experiments in a simulated environment. A novel method called contractive Q-network for updating the critic FA (Q-network) is proposed in the current research for reducing the number of these experiments. To show the efficiency of the developed method, two illustrative examples are conducted, first in the well-known puddle world and then in Push Recovery (PR) task on a simulated humanoid robot. Results show 20% improvement in convergence speed of the method.