Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA
暂无分享,去创建一个
Constantine Bekas | Heiner Giefers | Peter W. J. Staar | Christoph Hagleitner | C. Hagleitner | C. Bekas | P. Staar | H. Giefers
[1] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.
[2] John D. Davis,et al. BLAS Comparison on FPGA, CPU and GPU , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.
[3] Wu-chun Feng,et al. Trends in energy-efficient computing: A perspective from the Green500 , 2013, 2013 International Green Computing Conference Proceedings.
[4] Tony M. Brewer,et al. Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.
[5] Martin Burtscher,et al. Measuring GPU Power with the K20 Built-in Sensor , 2014, GPGPU@ASPLOS.
[6] Alan D. George,et al. Performance and productivity evaluation of hybrid-threading HLS versus HDLs , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[7] Yves Lhuillier,et al. A unified methodology for a fast benchmarking of parallel architecture , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[8] Gerhard Wellein,et al. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..
[9] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[10] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[11] Ümit V. Çatalyürek,et al. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.
[12] Laurent Lefèvre,et al. A survey on techniques for improving the energy efficiency of large-scale distributed systems , 2014, ACM Comput. Surv..
[13] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[14] Wayne Luk,et al. Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study , 2010, IEEE Transactions on Computers.
[15] Eric S. Chung,et al. Towards a Universal FPGA Matrix-Vector Multiplication Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[16] Yong Wang,et al. SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[17] Feng Zhao,et al. Energy aware consolidation for cloud computing , 2008, CLUSTER 2008.
[18] Greg Brown,et al. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.
[19] Rahul Khanna,et al. RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).
[20] M. Horowitz,et al. Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.
[21] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[22] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[23] Pat Hanrahan,et al. A Streaming Supercomputer , 2001 .
[24] FengWu-chun,et al. The Green500 List , 2007 .
[25] Joseph L. Greathouse,et al. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[27] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[28] Alan D. George,et al. Comparative analysis of OpenCL vs. HDL with image-processing kernels on Stratix-V FPGA , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[29] Manish Gupta,et al. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.
[30] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[31] Jing Zhang,et al. OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.
[32] Heiner Giefers,et al. Analyzing the energy-efficiency of dense linear algebra kernels by power-profiling a hybrid CPU/FPGA system , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.
[33] Yan Zhang,et al. FPGA vs. GPU for sparse matrix vector multiply , 2009, 2009 International Conference on Field-Programmable Technology.
[34] Constantine Bekas,et al. Stochastic Matrix-Function Estimators: Scalable Big-Data Kernels with High Performance , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[35] Vojin G. Oklobdzija. The Computer Engineering Handbook , 2007 .
[36] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[37] Luiz André Barroso,et al. The Case for Energy-Proportional Computing , 2007, Computer.
[38] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[39] John Wawrzynek,et al. Bridging the GPGPU-FPGA efficiency gap , 2011, FPGA '11.
[40] Nectarios Koziris,et al. Performance evaluation of the sparse matrix-vector multiplication on modern architectures , 2009, The Journal of Supercomputing.
[41] Margo I. Seltzer,et al. The case for application-specific benchmarking , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.
[42] Jason D. Bakos,et al. A Sparse Matrix Personality for the Convey HC-1 , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.
[43] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[44] Rolf Clauberg,et al. 4.4 Energy-efficient microserver based on a 12-core 1.8GHz 188K-CoreMark 28nm bulk CMOS 64b SoC for big-data applications with 159GB/S/L memory bandwidth system density , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.
[45] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[46] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[47] Shuaiwen Song,et al. The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[48] Yu Ting Chen,et al. A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[49] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).