Revisiting Token Dropping Strategy in Efficient BERT Pretraining