Accelerating Machine Learning Kernel in Hadoop Using FPGAs

Big data applications share inherent characteristics that are fundamentally different from traditional desktop CPU, parallel and web service applications. They rely on deep machine learning and data mining applications. A recent trend for big data analytics is to provide heterogeneous architectures to allow support for hardware specialization to construct the right processing engine for analytics applications. However, these specialized heterogeneous architectures require extensive exploration of design aspects to find the optimal architecture in terms of performance and cost. % Considering the time dedicated to create such specialized architectures, a model that estimates the potential speedup achievable through offloading various parts of the algorithm to specialized hardware would be necessary. This paper analyzes how offloading computational intensive kernels of machine learning algorithms to a heterogeneous CPU+FPGA platform enhances the performance. We use the latest Xilinx Signboards for implementation and result analysis. Furthermore, we perform a comprehensive analysis of communication and computation overheads such as data I/O movements, and calling several standard libraries that can not be offloaded to the accelerator to understand how the speedup of each application will contribute to its overall execution in an end-to-end Hadoop MapReduce environment.

[1]  Toshimori Honjo,et al.  Hardware acceleration of Hadoop MapReduce , 2013, 2013 IEEE International Conference on Big Data.

[2]  Yu Wang,et al.  FPMR: MapReduce framework on FPGA , 2010, FPGA '10.

[3]  Hayden Kwok-Hay So,et al.  Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[4]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[5]  Paul Chow,et al.  ZCluster: A Zynq-based Hadoop cluster , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[6]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[7]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.