Offline Reinforcement Learning for Optimizing Production Bidding Policies