Energy-efficient acceleration of MapReduce applications using FPGAs

Abstract In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K -means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3 × and 15 × is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3 × . We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.

[1]  W. Stechele,et al.  Energy consumption of Graphic Processing Units with respect to automotive use-cases , 2010, 2010 International Conference on Energy Aware Computing.

[2]  Houman Homayoun,et al.  Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[3]  Avesta Sasan,et al.  Big data analytics on heterogeneous accelerator architectures , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Hayden Kwok-Hay So,et al.  Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[5]  Seyong Lee,et al.  MapReduce with communication overlap (MaRCO) , 2013, J. Parallel Distributed Comput..

[6]  Kedi Huang,et al.  Scalable MapReduce Framework on FPGA Accelerated Commodity Hardware , 2012, NEW2AN.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  SémériaLuc,et al.  Synthesis of hardware models in C with pointers and complex data structures , 2001 .

[9]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[10]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[11]  Wayne Wolf,et al.  Hardware-software co-design of embedded systems , 1994, Proc. IEEE.

[12]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[13]  Ronald G. Dreslinski,et al.  Integrated 3D-stacked server designs for increasing physical density of key-value stores , 2014, ASPLOS.

[14]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[16]  Yang Liu,et al.  MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning , 2015, Comput. Intell. Neurosci..

[17]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[18]  Georgios Ch. Sirakoulis,et al.  A configurable mapreduce accelerator for multi-core FPGAs (abstract only) , 2014, FPGA.

[19]  Berkin Özisikyilmaz,et al.  MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[20]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[21]  George A. Constantinides,et al.  High-level synthesis of dynamic data structures: A case study using Vivado HLS , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[22]  Petru Eles,et al.  Scheduling of conditional process graphs for the synthesis of embedded systems , 1998, DATE.

[23]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[24]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[25]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[26]  Paul Chow,et al.  ZCluster: A Zynq-based Hadoop cluster , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[27]  Avesta Sasan,et al.  Big vs little core for energy-efficient Hadoop computing , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[28]  Houman Homayoun,et al.  Hadoop Workloads Characterization for Performance and Energy Efficiency Optimizations on Microservers , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[29]  Jordi Torres,et al.  Speeding Up Distributed MapReduce Applications Using Hardware Accelerators , 2009, 2009 International Conference on Parallel Processing.

[30]  Hongyan Li,et al.  MapReduce-based Backpropagation Neural Network over large scale mobile data , 2010, 2010 Sixth International Conference on Natural Computation.

[31]  Avesta Sasan,et al.  Energy-efficient acceleration of big data analytics applications using FPGAs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[32]  Houman Homayoun,et al.  Dynamically heterogeneous cores through 3D resource pooling , 2012, IEEE International Symposium on High-Performance Comp Architecture.