论文信息 - A LIGHTWEIGHT RUN-TIME SUPPORT FOR FAST DENSE LINEAR ALGEBRA ON MULTI-CORE

A LIGHTWEIGHT RUN-TIME SUPPORT FOR FAST DENSE LINEAR ALGEBRA ON MULTI-CORE

The work proposes MDF , a lightweight dynamic run-time support able to achieve high performance in the execution of dense linear algebra kernels on shared-cache multi-core. MDF implements a dynamic macro-dataow interpreter processing DAG graphs generated on-the-y out of standard numeric kernel code. The experimental results demonstrate that the performance obtained using MDF on both ne-grain and coarse-grain problems is comparable with or even better than that achieved by de-facto standard solutions (notably PLASMA library), which use separate run-time supports specically optimised for dierent computational grains on modern multi-core.

[1] Heinrich Meyr,et al. High level software synthesis for signal processing systems , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[2] Jack Dongarra,et al. Fully Dynamic Scheduler for Numerical Computing on Multicore Processors , 2009 .

[3] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[4] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[5] Arthur H. Veen,et al. Dataflow machine architecture , 1986, CSUR.

[6] Jack Dongarra,et al. Parallel tiled QR factorization for multicore architectures , 2008 .

[7] Peter Kilpatrick,et al. Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[8] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[9] Gabriele Mencagli,et al. EVALUATION OF ARCHITECTURAL SUPPORTS FOR FINE-GRAINED SYNCHRONIZATION MECHANISMS , 2013 .

[10] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[11] Peter Kilpatrick,et al. An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[12] Horacio González-Vélez,et al. A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[13] Jack J. Dongarra,et al. A scalable framework for heterogeneous GPU-based clusters , 2012, SPAA '12.

[14] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[15] Jack J. Dongarra,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.

[16] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .

[17] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[18] Murray Cole,et al. Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[19] Marco Danelutto,et al. Parallel Patterns for General Purpose Many-Core , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[20] Jack J. Dongarra,et al. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures , 2011, Concurr. Comput. Pract. Exp..