Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning