PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms
暂无分享,去创建一个
Minyi Guo | Chao Li | Jing Wang | Pengyu Wang | Weixin Deng
[1] Kai Hwang,et al. Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking , 2018, Bench.
[2] Stijn Eyerman,et al. Many-Core Graph Workload Analysis , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] C Hollowell,et al. The Effect of NUMA Tunings on CPU Performance , 2015 .
[4] Daniel Kusswurm. Advanced Vector Extensions (AVX) , 2014 .
[5] Tao Tang,et al. Efficient and Portable ALS Matrix Factorization for Recommender Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[6] Yanjun Wu,et al. RVTensor: A Light-Weight Neural Network Inference Framework Based on the RISC-V Architecture , 2019, Bench.
[7] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[8] Guangli Li,et al. XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips , 2019, Bench.
[9] Yuchen Zhang,et al. HPC AI500: A Benchmark Suite for HPC AI Systems , 2018, Bench.
[10] Fan Zhang,et al. AIoT Bench: Towards Comprehensive Benchmarking Mobile and Embedded Device Intelligence , 2018, Bench.
[11] Tianshu Hao,et al. The Implementation and Optimization of Matrix Decomposition Based Collaborative Filtering Task on X86 Platform , 2019, Bench.
[12] Dennis M. Wilkinson,et al. Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.
[13] Minghe Yu,et al. AIBench: An Industry Standard Internet Service AI Benchmark Suite , 2019, ArXiv.
[14] Xu Wen,et al. Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network , 2019, Bench.
[15] Torsten Hoefler,et al. NUMA-aware shared-memory collective communication for MPI , 2013, HPDC.
[16] Fan Zhang,et al. AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking , 2018, Bench.
[17] Brandon Lucia,et al. Combining Data Duplication and Graph Reordering to Accelerate Parallel Graph Processing , 2019, HPDC.
[18] Zheng Wang,et al. Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[19] Nicolas Gillis,et al. Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation , 2018, Neural Computation.
[20] Xiaosong Ma,et al. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Nicolas Gillis,et al. Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization , 2011, Neural Computation.
[22] Minyi Guo,et al. Excavating the Potential of GPU for Accelerating Graph Traversal , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[23] Intel ® Guide for Developing Multithreaded Applications Part 1 : Application Threading and Synchronization Summary , 2010 .
[24] Nectarios Koziris,et al. SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms , 2018, ACM Trans. Math. Softw..