Single Precision Natural Logarithm Architecture for Hard Floating-Point and DSP-Enabled FPGAs

In this paper we will present a novel method for implementing floating point (FP) elementary functions using the new FP single precision addition and multiplication features of the Altera Arria~10 DSP Block architecture. Our application example will use log(x), one of the most commonly required functions for emerging datacenter and computing FPGA targets. We will explain why the combination of new FPGA technology, and at the same time, a massive increase in computing performance requirement, fuels the need for this work. We show a comprehensive error analysis, both for the overall function, and each subsection of the architecture, demonstrating that the hard FP (HFP) Blocks, in conjunction with the traditional flexibility and connectivity of the FPGA, can provide a robust and high performance solution. These methods create a highly accurate single precision IEEE754 function, which is OpenCL conformant. Our methods map directly to almost exclusively embedded structures, and therefore result in significant reduction in logic resources and routing stress compared to current methods, and demonstrate that newly introduced FPGA routing architectures can be leveraged to use almost no soft resources. We also show that the latency of the log(x) function can be changed independently of the architecture and function, allowing the performance of the function to be adjusted directly to the system clock rate.

[1]  Florent de Dinechin,et al.  Table-based polynomials for fast hardware function evaluation , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[2]  John Harrison,et al.  A Machine-Checked Theory of Floating Point Arithmetic , 1999, TPHOLs.

[3]  Stephen Marshall,et al.  An elementary transcendental function core library for reconfigurable computing , 2008 .

[4]  Florent de Dinechin,et al.  Parameterized floating-point logarithm and exponential functions for FPGAs , 2007, Microprocess. Microsystems.

[5]  William M. Waite,et al.  Software manual for the elementary functions , 1980 .

[6]  Gerald Friedland,et al.  A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[7]  Alexandros Stamatakis,et al.  Efficient floating-point logarithm unit for FPGAs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[8]  William James Cody,et al.  Software Manual for the Elementary Functions (Prentice-Hall series in computational mathematics) , 1980 .

[9]  Peter W. Markstein,et al.  IA-64 and elementary functions - speed and precision , 2000 .

[10]  J. Muller,et al.  CR-LIBM A library of correctly rounded elementary functions in double-precision , 2006 .

[11]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[12]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[13]  F. de Dinechin,et al.  A parameterizable floating-point logarithm operator for FPGAs , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[14]  Karin Strauss,et al.  A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[15]  Nikhil Dhume,et al.  Parameterizable CORDIC-Based Floating-Point Library Operations , 2012 .

[16]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..