MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies