A Q-learning system considering swing-up gains for controlling a parallel double inverted pendulum