As a widely used iterative algorithm, the distributed Stochastic Gradient Descent (SGD) has shown great advances in training machine learning models due to the reduced time of the gradients computation. However, the huge number of iterations of SGD usually incurs huge communication cost on pushing local gradients and pulling global model that prohibits its further improvement over performance. In this article, to reduce the number of pulling operations, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed, in which each worker intermittently pulls the global model from the server and uses its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis shows that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Moreover, we also show that PRLC admits lower pulling frequency than the pulling reduction method without local compensation. The extensive experiments conducted on various models exhibit that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., requiring only half of the pulling operations of LAG.