Spatial: a language and compiler for application accelerators

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult. In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

[1]  Uday Bondhugula,et al.  A DSL compiler for accelerating image processing pipelines on FPGAs , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[2]  Hubertus Franke,et al.  A taxonomy of accelerator architectures and their programming models , 2010, IBM J. Res. Dev..

[3]  Jason Cong,et al.  Theory and algorithm for generalized memory partitioning in high-level synthesis , 2014, FPGA.

[4]  Paul H. J. Kelly,et al.  Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[6]  Pradeep Dubey,et al.  SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[8]  David F. Bacon,et al.  FPGA programming for the masses , 2013, CACM.

[9]  Michael F. P. O'Boyle,et al.  Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2018, Handbook of Signal Processing Systems.

[11]  Xuan Yang,et al.  Programming Heterogeneous Systems from an Image Processing DSL , 2016, ACM Trans. Archit. Code Optim..

[12]  Yong Wang,et al.  SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[13]  Paul H. J. Kelly,et al.  Application-oriented design space exploration for SLAM algorithms , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Antonia Zhai,et al.  Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.

[15]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[16]  Yu Ting Chen,et al.  A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Kunle Olukotun,et al.  Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[18]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[19]  Cheng Chen,et al.  Scala Based FPGA Design Flow (Abstract Only) , 2017, FPGA.

[20]  Jürgen Teich,et al.  HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.

[21]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[22]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[23]  Arvind Bluespec: A language for hardware design, simulation, synthesis and verification Invited Talk , 2003, MEMOCODE.

[24]  Kunle Olukotun,et al.  Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[25]  Michael F. P. O'Boyle,et al.  Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[26]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[27]  Karthikeyan Sankaralingam,et al.  DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.

[28]  David F. Bacon,et al.  FPGA Programming for the Masses , 2013, ACM Queue.

[29]  Pat Hanrahan,et al.  Rigel , 2016, ACM Trans. Graph..

[30]  Kunle Olukotun,et al.  Delite , 2014, ACM Trans. Embed. Comput. Syst..