论文信息 - PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms

PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms

Matrix factorization is a basis for many recommendation systems. Although alternating least squares with weighted-\(\lambda \)-regularization (ALS-WR) is widely used in matrix factorization with collaborative filtering, this approach unfortunately incurs insufficient parallel execution and ineffective memory access. Thus, we propose a solution for accelerating the ALS-WR algorithm by exploiting parallelism, sparsity and locality on x86 platforms. Our PSL can process 20 million ratings and the speedup using multi-threading is up to 14.5\(\times \) on a 20-core machine.

Minyi Guo | Chao Li | Jing Wang | Pengyu Wang | Weixin Deng

[1] Kai Hwang,et al. Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking , 2018, Bench.

[2] Stijn Eyerman,et al. Many-Core Graph Workload Analysis , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3] C Hollowell,et al. The Effect of NUMA Tunings on CPU Performance , 2015 .

[4] Daniel Kusswurm. Advanced Vector Extensions (AVX) , 2014 .

[5] Tao Tang,et al. Efficient and Portable ALS Matrix Factorization for Recommender Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[6] Yanjun Wu,et al. RVTensor: A Light-Weight Neural Network Inference Framework Based on the RISC-V Architecture , 2019, Bench.

[7] Endong Wang,et al. Intel Math Kernel Library , 2014 .

[8] Guangli Li,et al. XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips , 2019, Bench.

[9] Yuchen Zhang,et al. HPC AI500: A Benchmark Suite for HPC AI Systems , 2018, Bench.

[10] Fan Zhang,et al. AIoT Bench: Towards Comprehensive Benchmarking Mobile and Embedded Device Intelligence , 2018, Bench.

[11] Tianshu Hao,et al. The Implementation and Optimization of Matrix Decomposition Based Collaborative Filtering Task on X86 Platform , 2019, Bench.

[12] Dennis M. Wilkinson,et al. Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[13] Minghe Yu,et al. AIBench: An Industry Standard Internet Service AI Benchmark Suite , 2019, ArXiv.

[14] Xu Wen,et al. Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network , 2019, Bench.

[15] Torsten Hoefler,et al. NUMA-aware shared-memory collective communication for MPI , 2013, HPDC.

[16] Fan Zhang,et al. AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking , 2018, Bench.

[17] Brandon Lucia,et al. Combining Data Duplication and Graph Reordering to Accelerate Parallel Graph Processing , 2019, HPDC.

[18] Zheng Wang,et al. Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[19] Nicolas Gillis,et al. Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation , 2018, Neural Computation.

[20] Xiaosong Ma,et al. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21] Nicolas Gillis,et al. Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization , 2011, Neural Computation.

[22] Minyi Guo,et al. Excavating the Potential of GPU for Accelerating Graph Traversal , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[23] Intel ® Guide for Developing Multithreaded Applications Part 1 : Application Threading and Synchronization Summary , 2010 .

[24] Nectarios Koziris,et al. SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms , 2018, ACM Trans. Math. Softw..