Confirmation bias optimizes reward learning