Data-Intensive Computing Acceleration with Python in Xilinx FPGA

Data-intensive workloads drive the development of hardware design. Such data intensive services are driven the raising trend of novel machine learning techniques, such as CNN/RNN, over massive chunks of data objects. These services require novel devices with configurable high throughput in I/O (i.e., data-based model training), and uniquely large computation capability (i.e., large number of convolutional operations). In this paper, we present our early work on realizing a python-based Field-Programmable Gate Array (FPGA) system to support such data-intensive services. In our current system, we deploy a light layer of CNN optimization and a mixed hardware setup, including multiple FPGA/GPU nodes, to provide performance acceleration on the run. Our prototype can support popular machine learning platform, such as Caffe, etc. Our initial empirical results show that our system can perfect handling all data-intensive learning services.

[1]  Jerry Chan Ting Hai,et al.  Accelerating video and image processing design for FPGA using HDL coder and simulink , 2015, 2015 IEEE Conference on Sustainable Utilization And Development In Engineering and Technology (CSUDET).

[2]  Dong Wang,et al.  PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks , 2016, ArXiv.

[3]  M. Gokhale,et al.  FPGA computing in a data parallel C , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[4]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[5]  Marco D. Santambrogio,et al.  On How to Efficiently Implement Deep Learning Algorithms on PYNQ Platform , 2018, 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[6]  Mohamed Dessouky,et al.  Concurrent MAC unit design using VHDL for deep learning networks on FPGA , 2018, 2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE).

[7]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[8]  Nazar Abbas Saqib,et al.  FPGA Accelerated Computing Platform for MATLAB and C/C++ , 2013, 2013 11th International Conference on Frontiers of Information Technology.

[9]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[10]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[11]  Martin C. Herbordt,et al.  Achieving High Performance with FPGA-Based Computing , 2007, Computer.

[12]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[13]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[14]  Elias S. Manolakos,et al.  SysPy: using Python for processor-centric SoC design , 2010, 2010 17th IEEE International Conference on Electronics, Circuits and Systems.

[15]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16]  Scott Hauck,et al.  Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation , 2007 .

[17]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[18]  Peter M. Athanas,et al.  Quantitative analysis of floating point arithmetic on FPGA based custom computing machines , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[19]  Ioannis Stamelos,et al.  Spark acceleration on FPGAs: A use case on machine learning in Pynq , 2017, 2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST).

[20]  Christopher Batten,et al.  PyMTL: A Unified Framework for Vertically Integrated Computer Architecture Research , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  G. Gannot,et al.  Verilog HDL based FPGA design , 1994, International Verilog HDL Conference.

[24]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[25]  Ioannis Stamelos,et al.  FPGA acceleration of spark applications in a Pynq cluster , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[28]  Wayne Luk,et al.  PyHDL: Hardware Scripting with Python , 2003, Engineering of Reconfigurable Systems and Algorithms.

[29]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[30]  Junzhong Shen,et al.  FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency , 2017, Concurr. Comput. Pract. Exp..

[31]  Gabriel Weisz,et al.  Evaluating Rapid Application Development with Python for Heterogeneous Processor-Based FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[32]  Ioannis Stamelos,et al.  SPynq: Acceleration of machine learning applications over Spark on Pynq , 2017, 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[33]  Jan Decaluwe MyHDL: a python-based hardware description language , 2004 .

[34]  Michael Hübner,et al.  A dynamic partial reconfigurable overlay concept for PYNQ , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[35]  Christos-Savvas Bouganis,et al.  fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).