Probability density estimation of the Q function for reinforcement learning