Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization