Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations
暂无分享,去创建一个
Enrique S. Quintana-Ortí | Francisco D. Igual | Sandra Catalán | Rafael Rodríguez-Sánchez | Katzalin Olcoz | Francisco D. Igual | Luis Costero | Sandra Catalán | Rafael Rodríguez-Sánchez | Katzalin Olcoz | Luis Costero | E. Quintana‐Ortí
[1] David Black-Schaffer,et al. The HIPEAC vision for advanced computing in horizon 2020 , 2013 .
[2] Eduard Ayguadé,et al. The Mont-Blanc Prototype: An Alternative Approach for HPC Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Gene H. Golub,et al. Matrix computations , 1983 .
[4] Uri C. Weiser,et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.
[5] Jesús Labarta,et al. Parallelizing dense and banded linear algebra libraries using SMPSs , 2009, Concurr. Comput. Pract. Exp..
[6] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[7] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[8] Robert A. van de Geijn,et al. Anatomy of High-Performance Many-Threaded Matrix Multiplication , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[9] Karthikeyan Sankaralingam,et al. Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[10] Jesús Labarta,et al. Parallelizing dense and banded linear algebra libraries using SMPSs , 2009 .
[11] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[12] Mateo Valero,et al. Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[13] Eduard Ayguadé,et al. Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures , 2015, ICS.
[14] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.
[15] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[16] R.H. Dennard,et al. Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.
[17] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[18] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[19] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[20] Enrique S. Quintana-Ortí,et al. Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[21] Rafael Mayo,et al. Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors , 2015, Cluster Computing.
[22] Tze Meng Low,et al. The BLIS Framework , 2016 .