Level-3 BLAS on the TI C6678 Multi-core DSP
暂无分享,去创建一个
Robert A. van de Geijn | Francisco D. Igual | Murtaza Ali | Eric Stotzer | R. V. D. Geijn | Murtaza Ali | E. Stotzer
[1] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[2] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[3] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[4] Jack Dongarra,et al. LAPACK Users' guide (third ed.) , 1999 .
[5] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[6] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[7] Robert A. van de Geijn,et al. The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.
[8] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[9] Carl Kesselman,et al. Generalized communicators in the Message Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.
[10] Francisco D. Igual,et al. Unleashing DSPs for General-Purpose HPC FLAME Working Note # 61 , 2012 .
[11] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.