Scaling Laws for Reward Model Overoptimization