Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences