Towards Making the Most of BERT in Neural Machine Translation