Towards FPGA-assisted spark: An SVM training acceleration case study

A system that augments the Apache Spark data processing framework with FPGA accelerators is presented as a way to model and deploy FPGA-assisted applications in large-scale clusters. In our proposed framework, FPGAs can optionally be used in place of the host CPU for Resilient distributed datasets (RDDs) transformations, allowing seamless integration between gateware and software processing. Using the case of training an Support Vector Machine (SVM) cell image classifier as a case study, we explore the feasibilities, benefits and challenges of such technique. In our experiments where data communication between CPU and FPGA is tightly controlled, a consistent speedup of up to 1.6x can be achieved for the target SVM training application as the cluster size increases. Hardware-software techniques that are crucial to achieve acceleration such as the management of data partition size are explored. We demonstrate the benefits of the proposed framework, while also illustrate the importance of careful hardware-software management to avoid excessive CPU-FPGA communication that can quickly diminish the benefits of FPGA acceleration.

[1]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[2]  Hayden Kwok-Hay So,et al.  Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[3]  Kermin Fleming,et al.  The LEAP FPGA operating system , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[5]  Yu Wang,et al.  FPMR: MapReduce framework on FPGA , 2010, FPGA '10.

[6]  Philip Heng Wai Leong,et al.  Map-reduce as a Programming Model for Custom Computing Machines , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[7]  Edmund Y. Lam,et al.  Asymmetric-detection time-stretch optical microscopy (ATOM) for ultrafast high-contrast cellular imaging in flow , 2013, Scientific Reports.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Matei Zaharia,et al.  Resilient Distributed Datasets , 2016 .

[10]  Xinyu Niu,et al.  Accelerated cell imaging and classification on FPGAs for quantitative-phase asymmetric-detection time-stretch optical microscopy , 2015, 2015 International Conference on Field Programmable Technology (FPT).

[11]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[12]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..