Detecting Moments and Highlights in Videos via Natural Language Queries