An energy-efficient custom architecture for the SKA1-low central signal processor

The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with unprecedented sensitivity, angular resolution, and survey speed. This paper explores the design of a custom architecture for the central signal processor (CSP) of the SKA1-Low, the SKA's aperture-array instrument consisting of 131,072 antennas. The SKA1-Low's antennas receive signals between 50 and 350 MHz. After digitization and preliminary processing, samples are moved to the CSP for further processing. In this work, we describe the challenges in building the CSP, and present a first quantitative study for the implementation of a custom hardware architecture for processing the main CSP algorithms. By taking advantage of emerging 3D-stacked-memory devices and by exploring the design space for a 14-nm implementation, we estimate a power consumption of 14.4 W for processing all channels of a sub-band and an energy efficiency at application level of up to 208 GFLOPS/W for our architecture.

[1]  Rob van Nieuwpoort,et al.  Correlating Radio Astronomy Signals with Many-Core Hardware , 2011, International Journal of Parallel Programming.

[2]  Michael A. Clark,et al.  Accelerating radio astronomy cross-correlation with graphics processing units , 2011, Int. J. High Perform. Comput. Appl..

[3]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[4]  Mark Horowitz,et al.  Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.

[5]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[6]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[7]  Rob van Nieuwpoort,et al.  The LOFAR correlator: implementation and performance analysis , 2010, PPoPP '10.

[8]  James E. Jaussi,et al.  A Scalable 5–15 Gbps, 14–75 mW Low-Power I/O Transceiver in 65 nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[9]  David Blaauw,et al.  Exploring DRAM organizations for energy-efficient and resilient exascale memories , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  James R. Geraci,et al.  A transpose-free in-place SIMD optimized FFT , 2012, TACO.

[11]  Christoph Hagleitner,et al.  Challenges in exascale radio astronomy: Can the SKA ride the technology wave? , 2015, Int. J. High Perform. Comput. Appl..

[12]  Nicolai Petkov,et al.  Hyper-systolic matrix multiplication , 1998, Parallel Comput..

[13]  J. van Lunteren Towards memory centric computing: a flexible address mapping scheme , 1999, Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.99TH8411).

[14]  John D. Bunton,et al.  A Radio Astronomy Correlator Optimized for the Xilinx Virtex-4 SX FPGA , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[15]  Henk Corporaal,et al.  An End-to-End Computing Model for the Square Kilometre Array , 2014, Computer.

[16]  Sanu Mathew,et al.  A 340 mV-to-0.9 V 20.2 Tb/s Source-Synchronous Hybrid Packet/Circuit-Switched 16 × 16 Network-on-Chip in 22 nm Tri-Gate CMOS , 2014, IEEE Journal of Solid-State Circuits.

[17]  Andreas Gerstlauer,et al.  Transforming a linear algebra core to an FFT accelerator , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[18]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[19]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.