Value-free reinforcement learning: Policy optimization as a minimal model of operant behavior