Intel's 80-core terascale processor was the first generally programmable microprocessor to break the Teraflops barrier. The primary goal for the chip was to study power management and on-die communication technologies. When announced in 2007, it received a great deal of attention for running a stencil kernel at 1.0 single precision TFLOPS while using only 97 Watts. The literature about the chip, however, focused on the hardware, saying little about the software environment or the kernels used to evaluate the chip. This paper completes the literature on the 80-core terascale processor by fully defining the chip's software environment. We describe the instruction set, the programming environment, the kernels written for the chip, and our experiences programming this microprocessor. We close by discussing the lessons learned from this project and what it implies for future message passing, network-on-a-chip processors.
[1]
Marshall C. Pease,et al.
An Adaptation of the Fast Fourier Transform for Parallel Processing
,
1968,
JACM.
[2]
D. Rose.
Matrix identities of the fast fourier transform
,
1980
.
[3]
Sriram R. Vangal,et al.
A 5-GHz Mesh Interconnect for a Teraflops Processor
,
2007,
IEEE Micro.
[4]
S. Borkar,et al.
An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS
,
2008,
IEEE Journal of Solid-State Circuits.