Generative Spoken Language Model based on continuous word-sized audio tokens