Q-learning based linear quadratic regulator with balanced exploration and exploitation for unknown systems