Exploring the Design Space of an Energy-Efficient Accelerator for the SKA1-Low Central Signal Processor

The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with unprecedented sensitivity, angular resolution, and survey speed. Collectively, the SKA’s antennas are expected to gather exabytes of data per second and store one petabyte of data every day, requiring exa operations per second for the processing. This paper focuses on the SKA1-Low, the SKA’s aperture-array instrument consisting of 131,072 antennas that will be built in the first phase of the deployment of the project. In particular, our work explores the design of a custom architecture for the central signal processor (CSP) of the SKA1-Low. The CSP processes digitized samples sent by antennas receiving extra-terrestrial radio-frequency signals between 50 and 350 MHz. We describe the challenges in building the CSP, and present a quantitative study for the implementation of a custom hardware architecture for executing the main CSP algorithms. By taking advantage of emerging 3D-stacked-memory devices and by exploring the design space for a 14-nm implementation, we estimate a power consumption of 9.62 W for processing all channels of a sub-band and an energy efficiency at application level of up to 312 GFLOPS/W for our architecture.

[1]  B. G. Clark An efficient implementation of the algorithm 'CLEAN' , 1980 .

[2]  J. van Lunteren Towards memory centric computing: a flexible address mapping scheme , 1999 .

[3]  Nicolai Petkov,et al.  Hyper-systolic matrix multiplication , 1998, Parallel Comput..

[4]  William J. Dally,et al.  Stream register files with indexed access , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[5]  J. van Lunteren A novel processor architecture for high-performance stream processing , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[6]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[7]  John D. Bunton,et al.  A Radio Astronomy Correlator Optimized for the Xilinx Virtex-4 SX FPGA , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[8]  James E. Jaussi,et al.  A Scalable 5–15 Gbps, 14–75 mW Low-Power I/O Transceiver in 65 nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[9]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[10]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[11]  Rob van Nieuwpoort,et al.  Correlating Radio Astronomy Signals with Many-Core Hardware , 2011, International Journal of Parallel Programming.

[12]  Yifan He,et al.  Xetal-Pro: An ultra-low energy and high throughput SIMD processor , 2010, Design Automation Conference.

[13]  Rob van Nieuwpoort,et al.  The LOFAR correlator: implementation and performance analysis , 2010, PPoPP '10.

[14]  Mark Horowitz,et al.  Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.

[15]  R. Jongerius Analyzing LOFAR station processing on multi-core platforms , 2012 .

[16]  James R. Geraci,et al.  A transpose-free in-place SIMD optimized FFT , 2012, TACO.

[17]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[18]  Andreas Gerstlauer,et al.  Transforming a linear algebra core to an FFT accelerator , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[19]  David Blaauw,et al.  Exploring DRAM organizations for energy-efficient and resilient exascale memories , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Yifan He,et al.  SIMD made explicit , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[21]  Michael A. Clark,et al.  Accelerating radio astronomy cross-correlation with graphics processing units , 2011, Int. J. High Perform. Comput. Appl..

[22]  Christoph Hagleitner,et al.  Exascale Radio Astronomy: Can We Ride the Technology Wave? , 2014, ISC.

[23]  Henk Corporaal,et al.  An End-to-End Computing Model for the Square Kilometre Array , 2014, Computer.

[24]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[25]  R. Jongerius,et al.  End-to-end compute model of the Square Kilometre Array , 2014 .

[26]  Sanu Mathew,et al.  A 340 mV-to-0.9 V 20.2 Tb/s Source-Synchronous Hybrid Packet/Circuit-Switched 16 × 16 Network-on-Chip in 22 nm Tri-Gate CMOS , 2014, IEEE Journal of Solid-State Circuits.

[27]  Christoph Hagleitner,et al.  An energy-efficient custom architecture for the SKA1-low central signal processor , 2015, Conf. Computing Frontiers.

[28]  Tejas Karkhanis,et al.  Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[29]  Christoph Hagleitner,et al.  Challenges in exascale radio astronomy: Can the SKA ride the technology wave? , 2015, Int. J. High Perform. Comput. Appl..