On The Robustness of Self-Supervised Representations for Spoken Language Modeling