MegaBlocks: Efficient Sparse Training with Mixture-of-Experts