Communication Optimization for the 16-Core Epiphany Floating-Point Processor Array

The management and optimization of communication in an NoC-based (network-on-chip) bespoke computing platform such as the Parallella (Zynq 7010 + Epiphany-III SoC) is critical for performance and energy-efficiency of floating-point bulk-synchronous workloads. In this paper, we explore the opportunities and capabilities of the Epiphany-III SoC for communication-intensive workloads. Using our communication support library for the Epiphany, we are able to accelerate single-precision BSP workloads like the Sparse Matrix-Vector multiplication (SpMV) on Matrix Market datasets by up to 6.5× and PageRank algorithm on the BerkStan SNAP dataset by up to 8×, while lowering power usage by 2× over optimized ARM-based implementations. When compared to optimized OpenMP x86 mappings, we observe a ≈10× improvement in energy efficiency (GFLOP/s/W) with Epiphany SoC.