Aligning Agent Policy with Externalities: Reward Design via Bilevel RL