Gated Linear Attention Transformers with Hardware-Efficient Training