Energy-Efficient Online Scheduling of Transformer Inference Services on GPU Servers