A fast parallel SGD for matrix factorization in shared memory systems

Matrix factorization is known to be an effective method for recommender systems that are given only the ratings from users to items. Currently, stochastic gradient descent (SGD) is one of the most popular algorithms for matrix factorization. However, as a sequential approach, SGD is difficult to be parallelized for handling web-scale problems. In this paper, we develop a fast parallel SGD method, FPSGD, for shared memory systems. By dramatically reducing the cache-miss rate and carefully addressing the load balance of threads, FPSGD is more efficient than state-of-the-art parallel algorithms for matrix factorization.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  H. Robbins A Stochastic Approximation Method , 1951 .

[3]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[4]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[5]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[6]  Cho-Jui Hsieh,et al.  Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .

[7]  Gideon S. Mann,et al.  Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.

[8]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[9]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[10]  Domonkos Tikk,et al.  Fast als-based matrix factorization for explicit and implicit feedback datasets , 2010, RecSys '10.

[11]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[12]  Gideon S. Mann,et al.  MapReduce/Bigtable for Distributed Optimization , 2010 .

[13]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[14]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[15]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[16]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.