Revisiting Knowledge Distillation for Autoregressive Language Models