Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval