论文信息 - CHEAPS2AGA: Bounding Space Usage in Variance-Reduced Stochastic Gradient Descent over Streaming Data and Its Asynchronous Parallel Variants

CHEAPS2AGA: Bounding Space Usage in Variance-Reduced Stochastic Gradient Descent over Streaming Data and Its Asynchronous Parallel Variants

Stochastic Gradient Descent (SGD) is widely used to train a machine learning model over large datasets, yet its slow convergence rate can be a bottleneck. As a remarkable family of variance reduction techniques, memory algorithms such as SAG and SAGA have been proposed to accelerate the convergence rate of SGD. However, these algorithms need to store per training data point corrections in memory. The unlimited space usage feature is impractical for modern large-scale applications, especially over data points that arrive over time (referred to as streaming data in this paper). To overcome this weakness, this paper investigates the methods that bound the space usage in the state-of-the-art family of variance-reduced stochastic gradient descent over streaming data, and presents CHEAPS2AGA. At each step of updating the model, the key idea of CHEAPS2AGA is always reserving N random data points as samples, while re-using information about past stochastic gradients across all the observed data points with limited space usage. In addition, training an accurate model over streaming data requires the algorithm to be time-efficient. To accelerate the model training phase, CHEAPS2AGA embraces a lock-free data structure to insert new data points and remove unused data points in parallel, and updates the model parameters without using any locking. We conduct comprehensive experiments to compare CHEAPS2AGA to prior related algorithms suited for streaming data. The experimental results demonstrate the practical competitiveness of CHEAPS2AGA in terms of scalability and accuracy.

Zhiyu Hao | Yaqiong Peng | Lun Li | Haiqiang Fei | Zhenquan Ding