Benchmarking SW26010 Many-Core Processor

Equipped with the Chinese home-grown SW26010 many-core processor, TaihuLight claims the top place in the TOP500 list released in June 2016. Although some large-scale applications have been successfully running on the supercomputer, few studies have been conducted to analyze the performance impact caused by the extreme memory-bound architecture design. To facilitate native in-depth optimizations and performance modeling, understanding the architecture and performance characteristics of SW26010 is essential. Therefore, we developed a suite of micro-benchmarks written in C/assembly and characterized the key architectural components: (1) the pipelines of computing processing elements; (2) the softwarecontrolled memory hierarchy; and (3) the lightweight onchip communication at register level. Our benchmark results indicate that the comprehensive optimizations from highlevel algorithm design to low-level instruction scheduling are required to achieve good performance on SW26010.