Asynchronous SGD without gradient delay for efficient distributed training