Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving