论文信息 - FPMR: MapReduce Framework on FPGA A Case Study of RankBoost Acceleration

FPMR: MapReduce Framework on FPGA A Case Study of RankBoost Acceleration

learning and data mining are gaining increasing attentions of the computing society. FPGA provides a highly parallel, low power, and flexible hardware platform for this domain, while the difficulty of programming FPGA greatly limits its prevalence. MapReduce is a parallel programming framework that could easily utilize inherent parallelism in algorithms. In this paper, we describe FPMR, a MapReduce framework on FPGA, which provides programming abstraction, hardware architecture, and basic building blocks to developers. An on-chip processor scheduler is implemented to maximize the utilization of computation resources and achieve better load balancing. An efficient data access scheme is carefully designed to maximize data reuse and throughput. Meanwhile, the FPMR framework hides the task control, synchronization, and communication away from designers so that more attention can be paid to the application itself. A case study of RankBoost acceleration based on FPMR demonstrates that FPMR efficiently helps with the development productivity; and the speedup is 31.8x versus CPU-based implementation. This performance is comparable to a fully manually designed version, which achieves 33.5x speedup. Two other applications: SVM, PageRank are also discussed to show the generalization of the framework.

[1] Lei Zhang,et al. FPGA Acceleration of RankBoost in Web Search Engines , 2009, TRETS.

[2] Philip Heng Wai Leong,et al. Map-reduce as a Programming Model for Custom Computing Machines , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[3] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[4] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5] Nachiket Kapre,et al. Design patterns for reconfigurable computing , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[6] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[8] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[9] Benjamin Rose,et al. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[10] Christoforos E. Kozyrakis,et al. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[11] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12] Martin C. Herbordt,et al. Achieving High Performance with FPGA-Based Computing , 2007, Computer.

[13] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[14] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .