Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders