Reinforcement Learning with Quantum Variational Circuits