Mixhead: Breaking the low-rank bottleneck in multi-head attention language models