Time and energy modeling of high-performance Level-3 BLAS on x86 architectures
暂无分享,去创建一个
Enrique S. Quintana-Ortí | Rafael Mayo | Francisco D. Igual | Sandra Catalán | Rafael Rodríguez-Sánchez | Pedro Alonso
[1] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[2] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[3] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[4] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[5] Enrique S. Quintana-Ortí,et al. Analyzing the Energy Efficiency of the Memory Subsystem in Multicore Processors , 2014, 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications.
[6] Eduard Ayguadé,et al. Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.
[7] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[8] Rahul Khanna,et al. RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).
[9] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[10] Sally A. McKee,et al. Portable, scalable, per-core power estimation for intelligent resource management , 2010, International Conference on Green Computing.
[11] Francisco J. Cazorla,et al. Hardware support for accurate per-task energy metering in multicore systems , 2013, TACO.
[12] Gene H. Golub,et al. Matrix computations , 1983 .
[13] Enrique S. Quintana-Ortí,et al. DVFS-control techniques for dense linear algebra operations on multi-core processors , 2012, Computer Science - Research and Development.
[14] Richard W. Vuduc,et al. A Roofline Model of Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[15] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[16] David Black-Schaffer,et al. The HIPEAC vision for advanced computing in horizon 2020 , 2013 .
[17] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[18] Karthikeyan Sankaralingam,et al. Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[19] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[20] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[21] Gokcen Kestor,et al. Quantifying the energy cost of data movement in scientific applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[22] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[23] Robert A. van de Geijn,et al. Anatomy of High-Performance Many-Threaded Matrix Multiplication , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[24] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.