Design Space Exploration for Hardware Acceleration of Machine Learning Applications in MapReduce

Emerging big data applications heavily rely on machine learning algorithms which are computationally intensive. To meet computational requirements, and power and scalability challenges, FPGA based Hardware accelerators have found their way in data centers and cloud infrastructures. Recent efforts on HW acceleration of big data mainly attempt to accelerate a particular application and deploy it on a specific architecture that fits well its performance and power requirements. Given the diversity of architectures and ML applications, the important research question is which architecture is better suited to meet the performance, power and energy-efficiency requirements of a diverse range of ML-based analytics applications. In this work, we answer this question by investigating how the type of FPGA (low-end vs. high-end), and its integration with the CPU (on-chip vs. off-chip) along with the choice of CPU (high performance big vs. low power little servers) affects the speedup yield and power reduction in a CPU+FPGA architecture for machine learning applications implemented in MapReduce. We show that among the three architectural parameters, the type of CPU is the most dominant factor in determining the execution time and power in a CPU+FPGA architecture for MapReduce applications. The integration technology and FPGA type comes next, with the power and performance least sensitive to the FPGA type.