MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval