BiC-Net: Learning Efficient Spatio-temporal Relation for Text-Video Retrieval