The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications