Accelerating recommender systems using GPUs

We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the GPU are better than the results of the multi-core versions (maximum speedup of 14.8).

[1]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[2]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[3]  Wesley W. Chu,et al.  A social network-based recommender system , 2010 .

[4]  Claude Irwin Palmer,et al.  Algebra with applications , 1913 .

[5]  John W. Auer,et al.  Linear algebra with applications , 1996 .

[6]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[7]  Domonkos Tikk,et al.  Alternating least squares for personalized ranking , 2012, RecSys.

[8]  Wesley W. Chu,et al.  A Social Network-Based Recommender System (SNRS) , 2010, Data Mining for Social Network Data.

[9]  S. Floyd,et al.  Adaptive Web , 1997 .

[10]  Yuying Liang,et al.  Improving the Collaborative Filtering Recommender System by Using GPU , 2012, International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[11]  Robert Hochberg,et al.  Matrix Multiplication with CUDA – A basic introduction to the CUDA programming model , 2012 .

[12]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[13]  Francesco Ricci,et al.  Improving recommender systems with adaptive conversational strategies , 2009, HT '09.

[14]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[15]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[16]  Ram Dantu,et al.  Group Recommendation System for Facebook , 2008, OTM Workshops.

[17]  Dave Zachariah,et al.  Alternating Least-Squares for Low-Rank Matrix Reconstruction , 2012, IEEE Signal Processing Letters.

[18]  H. Andrews,et al.  Singular value decompositions and digital image processing , 1976 .

[19]  Jianbin Fang,et al.  A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.

[20]  Inderjit S. Dhillon,et al.  Parallel matrix factorization for recommender systems , 2014, Knowl. Inf. Syst..

[21]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[22]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[23]  Zhi Shang High performance computing for flood simulation using Telemac based on hybrid MPI/OpenMP parallel programming , 2014, Int. J. Model. Simul. Sci. Comput..

[24]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[25]  Nicholas Wilt,et al.  The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .

[26]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[27]  Aravindh Krishnamoorthy,et al.  Matrix inversion using Cholesky decomposition , 2011, 2013 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).