Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models