Fine-Grained Text-to-Video Temporal Grounding from Coarse Boundary