Learning Semantic Features for Dense Video Captioning