Learning to control a joint driven double inverted pendulum using nested actor/critic algorithm