ByteComp: Revisiting Gradient Compression in Distributed Training