A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems

Matrix factorization is known to be an effective method for recommender systems that are given only the ratings from users to items. Currently, stochastic gradient (SG) method is one of the most popular algorithms for matrix factorization. However, as a sequential approach, SG is difficult to be parallelized for handling web-scale problems. In this article, we develop a fast parallel SG method, FPSG, for shared memory systems. By dramatically reducing the cache-miss rate and carefully addressing the load balance of threads, FPSG is more efficient than state-of-the-art parallel algorithms for matrix factorization.

[1]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[2]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[3]  H. Robbins A Stochastic Approximation Method , 1951 .

[4]  Inderjit S. Dhillon,et al.  NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..

[5]  Cho-Jui Hsieh,et al.  Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .

[6]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[7]  Chih-Jen Lin,et al.  A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.

[8]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[9]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[10]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[11]  Gideon S. Mann,et al.  Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.

[12]  Gideon S. Mann,et al.  MapReduce/Bigtable for Distributed Optimization , 2010 .

[13]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[14]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[15]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[16]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[17]  Domonkos Tikk,et al.  Fast als-based matrix factorization for explicit and implicit feedback datasets , 2010, RecSys '10.

[18]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[19]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[20]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .