S2FA: An Accelerator Automation Framework for Heterogeneous Computing in Datacenters

Big data analytics using the JVM-based MapReduce framework has become a popular approach to address the explosive growth of data sizes. Adopting FPGAs in datacenters as accelerators to improve performance and energy efficiency also attracts increasing attention. However, the integration of FPGAs into such JVM-based frameworks raises the challenge of poor programmability. Programmers must not only rewrite Java/Scala programs to C/C++ or OpenCL, but, to achieve high performance, they must also take into consideration the intricacies of FPGAs. To address this challenge, we present S2FA (Spark-to-FPGA-Accelerator), an automation framework that generates FPGA accelerator designs from Apache Spark programs written in Scala. S2FA bridges the semantic gap between object-oriented languages and HLS C while achieving high performance using learning-based design space exploration. Evaluation results show that our generated FPGA designs achieve up to 49.9× performance improvement for several machine learning applications compared to their corresponding implementations on the JVM.

[1]  Michèle Sebag,et al.  Analyzing bandit-based adaptive operator selection mechanisms , 2010, Annals of Mathematics and Artificial Intelligence.

[2]  Luca P. Carloni,et al.  On learning-based methods for design-space exploration with High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Yun Liang,et al.  Design space exploration of multiple loops on FPGAs using high level synthesis , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[4]  Kazutoshi Wakabayashi,et al.  Machine learning predictive modelling high-level synthesis design space exploration , 2012, IET Comput. Digit. Tech..

[5]  Roberto M. Rodríguez,et al.  Image Segmentation via an Iterative Algorithm of the Mean Shift Filtering for Different Values of the Stopping Threshold , 2012 .

[6]  Rudolf Eigenmann,et al.  Compiler Infrastructure , 2013, International Journal of Parallel Programming.

[7]  Kunle Olukotun,et al.  Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8]  Wei Zhang,et al.  Melia: A MapReduce Framework on OpenCL-Based FPGAs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[9]  Jason Cong,et al.  When apache spark meets FPGAs: a case study for next-generation DNA sequencing acceleration , 2016, CloudCom 2016.

[10]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Jason Cong,et al.  When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[12]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13]  Wei Zhang,et al.  A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Vittorio Zaccaria,et al.  SPIRIT: Spectral-Aware Pareto Iterative Refinement Optimization for Supervised High-Level Synthesis , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  Zhiru Zhang,et al.  A Parallel Bandit-Based Approach for Autotuning FPGA Compilation , 2017, FPGA.

[19]  Jason Cong,et al.  Source-to-Source Optimization for HLS , 2016, FPGAs for Software Programmers.

[20]  Dirk Koch,et al.  FPGAs for Software Programmers , 2016 .

[21]  Martin Margala,et al.  SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters , 2015, ArXiv.

[22]  Esley Torres,et al.  Image Segmentation Through an Iterative Algorithm of the Mean Shift , 2012 .

[23]  Jason Cong,et al.  Improving polyhedral code generation for high-level synthesis , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[24]  Jason Cong,et al.  Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper , 2016, ISLPED.

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[27]  Kunle Olukotun,et al.  Generating Configurable Hardware from Parallel Patterns , 2015, ASPLOS.

[28]  Jason Cong,et al.  Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale , 2016, SoCC.

[29]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.