An Efficient Implementation of the ALS-WR Algorithm on x86 CPUs

With the continuous development of computers and big data technology, more recommendation systems are applied in the fields of online music, online movies, games, online shopping, and so on, to solve information redundancy and effectively to recommend interesting products for users. In this paper, we implement and accelerate the Alternating-Least-Squares with Weighted-\(\lambda \)-Regularization (ALS-WR) by adopting a two-level parallel strategies on the x86-64 Zen-based CPUs. As one of the most widely used recommendation algorithms, the ALS-WR algorithm is based on matrix factorization. In the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a dimensionality reduction technique that factorizes a matrix into a product of matrices. Therefore, vector and matrix operations are the computational core of the ALS-WR algorithm, accelerating these computational kernels can effectively improve the overall performance of the ALS-WR algorithm. The experimental results show that our high-performance ALS-WR implementation can achieve 185.09 s (with 100 features and 30 iterations) on the MovieLens 20 M dataset.

[1]  Teja Singh,et al.  3.2 Zen: A next-generation high-performance ×86 core , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[2]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[3]  Minghe Yu,et al.  AIBench: An Industry Standard Internet Service AI Benchmark Suite , 2019, ArXiv.

[4]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[6]  Kai Hwang,et al.  Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking , 2018, Bench.

[7]  Fan Zhang,et al.  AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking , 2018, Bench.

[8]  Minyi Guo,et al.  PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms , 2019, Bench.

[9]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[10]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[11]  Xu Wen,et al.  Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network , 2019, Bench.

[12]  Guangli Li,et al.  XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips , 2019, Bench.

[13]  Fan Zhang,et al.  AIoT Bench: Towards Comprehensive Benchmarking Mobile and Embedded Device Intelligence , 2018, Bench.

[14]  Yuchen Zhang,et al.  HPC AI500: A Benchmark Suite for HPC AI Systems , 2018, Bench.

[15]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[16]  Tianshu Hao,et al.  The Implementation and Optimization of Matrix Decomposition Based Collaborative Filtering Task on X86 Platform , 2019, Bench.

[17]  Xiao Wang,et al.  AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs , 2019, SC.

[18]  Fernando Ortega,et al.  Recommending items to group of users using Matrix Factorization based Collaborative Filtering , 2016, Inf. Sci..

[19]  Faraz Makari Manshadi,et al.  Scalable optimization algorithms for recommender systems , 2014 .

[20]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[21]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[22]  Yanjun Wu,et al.  RVTensor: A Light-Weight Neural Network Inference Framework Based on the RISC-V Architecture , 2019, Bench.