An Optimistic Approach to the Temporal Difference Error in Off-Policy Actor-Critic Algorithms