RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval