A novel warp scheduling scheme considering long-latency operations for high-performance GPUs