R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making