Programming the Adapteva Epiphany 64-core network-on-chip coprocessor
暂无分享,去创建一个
Bob Edwards | Alistair P. Rendell | Gaurav Mitra | Anish Varghese | Anish Varghese | Bob Edwards | Gaurav Mitra | A. Rendell
[1] Saurabh Dighe,et al. The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Jack J. Dongarra. The Impact of Multicore on Math Software and Exploiting Single Precision Computing to Obtain Double Precision Results , 2006, ISPA.
[3] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[4] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[5] Matthias S. Müller,et al. OpenMP in the Era of Low Power Devices and Accelerators , 2013, Lecture Notes in Computer Science.
[6] Pradeep Dubey,et al. Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[7] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[8] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[9] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[10] Alistair P. Rendell,et al. OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip , 2013, IWOMP.
[11] Jun Zhou,et al. Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[12] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[13] Tarek El-Ghazawi,et al. Experiences with UPC on TILE-64 processor , 2011, 2011 Aerospace Conference.
[14] Yaniv Sapir Adapteva. Scalable Parallel Multiplication of Big Matrices , 2012 .
[15] Bronis R. de Supinski,et al. OpenMP for Accelerators , 2011, IWOMP.
[16] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..
[17] Robert A. van de Geijn,et al. Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[19] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[20] Gerhard Wellein,et al. Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[21] Christoph Kessler,et al. Efficient On-Chip Pipelined Streaming Computations on Scalable Manycore Architectures , 2012 .
[22] Julien Langou,et al. Exploiting Mixed Precision Floating Point Hardware in Scientific Computations , 2006, High Performance Computing Workshop.
[23] Timothy G. Mattson,et al. Programming the Intel 80-core network-on-a-chip Terascale Processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.