Kibo: An Open-Source Fixed-Point Tool-kit for Training and Inference in FPGA-Based Deep Learning Networks

Field-Programmable Gate Arrays (FPGAs) have become an essential component of the deep learning landscape, providing a balance between flexibility, customization, and efficiency. One of the key optimizations afforded by FPGA technology is the ability to customize the bit width of fixed-point weights and activations within deep learning networks. In this paper, we present an open-source tool-kit which allows a researcher to investigate different fixed-point representations and saturating arithmetic operations in Python. The tool-kit over-rides arithmetic and comparison functions commonly used in deep learning structures, allowing a researcher to quickly evaluate the impact of alternative numeric representations. Compared to higher-level frameworks such as Tensorflow or PyTorch, a much wider set of numeric precisions can be modeled. Unlike lower-level C-synthesis tools, our tool-kit is written in Python providing the ability to much more rapidly explore architectural alternatives. Our framework is open-source and is available on-line.

[1]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[2]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[4]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[5]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[6]  Hayden Kwok-Hay So,et al.  NnCore: A parameterized non-linear function generator for machine learning applications in FPGAs , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[7]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[8]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[9]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Zhenyu Liu,et al.  Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design , 2017, AAAI.

[12]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[13]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[14]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[15]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[16]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[17]  Guangwen Yang,et al.  F-CNN: An FPGA-based framework for training Convolutional Neural Networks , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[18]  Lin Sun,et al.  FPGA-based training of convolutional neural networks with a reduced precision floating-point library , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[19]  Christos-Savvas Bouganis,et al.  Approximate FPGA-based LSTMs under Computation Time Constraints , 2018, ARC.