Learning a reach trajectory based on binary reward feedback