Revisiting Checkpoint Averaging for Neural Machine Translation