Faster learning by reduction of data access time

Nowadays, the major challenge in machine learning is the ‘Big Data’ challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used empirical risk minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove similar convergence for systematic and cyclic sampling as the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training.

[1]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[2]  Jorge Nocedal,et al.  A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[3]  Siddharth Gopal,et al.  Adaptive Sampling for SGD by Exploiting Side Information , 2016, ICML.

[4]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[5]  Athanasios V. Vasilakos,et al.  Machine learning on big data: Opportunities and challenges , 2017, Neurocomputing.

[6]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[7]  Tong Zhang,et al.  Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling , 2014, ArXiv.

[8]  Anuj Sharma,et al.  Problem formulations and solvers in linear SVM: a review , 2018, Artificial Intelligence Review.

[9]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[10]  Anuj Sharma,et al.  SAAGs: Biased Stochastic Variance Reduction Methods , 2018, ArXiv.

[11]  Anuj Sharma,et al.  Mini-batch Block-coordinate based Stochastic Average Adjusted Gradient Methods to Solve Big Data Problems , 2017, ACML.

[12]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[13]  Quanquan Gu,et al.  Accelerated Stochastic Block Coordinate Descent with Optimal Sampling , 2016, KDD.

[14]  Sashank J. Reddi,et al.  New Optimization Methods for Modern Machine Learning , 2017 .

[15]  Peter Richtárik,et al.  Coordinate Descent Face-Off: Primal or Dual? , 2016, 1605.08982.

[16]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[17]  Anuj Sharma,et al.  Faster Algorithms for Large-scale Machine Learning using Simple Sampling Techniques , 2018, ArXiv.

[18]  Peter Richtárik,et al.  Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2014, ArXiv.

[19]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[20]  Hongchao Zhang,et al.  Inexact proximal stochastic gradient method for convex composite optimization , 2017, Comput. Optim. Appl..

[21]  Alexander J. Smola,et al.  Efficient mini-batch training for stochastic optimization , 2014, KDD.

[22]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[23]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[24]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[25]  V VasilakosAthanasios,et al.  Machine learning on big data , 2017 .

[26]  Chih-Jen Lin,et al.  Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.

[27]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[28]  W. G. Madow On the Theory of Systematic Sampling, II , 1944 .

[29]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[30]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[31]  W. G. Madow On the Theory of Systematic Sampling, III. Comparison of Centered and Random Start Systematic Sampling , 1953 .