ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning