Video Super-Resolution with Spatial-Temporal Transformer Encoder