AdaScale SGD: A Scale-Invariant Algorithm for Distributed Training