Transformers can optimally learn regression mixture models