A matrix-multiply unit for posits in reconfigurable logic leveraging (open)CAPI

In this paper, we present the design in reconfigurable logic of a matrix multiplier for matrices of 32-bit posit numbers with es=2 [1]. Vector dot products are computed without intermediate rounding as suggested by the proposed posit standard to maximally retain precision. An initial implementation targets the CAPI 1.0 interface on the POWER8 processor and achieves about 10Gpops (Giga posit operations per second). Follow-on implementations targeting CAPI 2.0 and OpenCAPI 3.0 on POWER9 are expected to achieve up to 64Gpops. Our design is available under a permissive open source license at https://github.com/ChenJianyunp/Unum_matrix_multiplier. We hope the current work, which works on CAPI 1.0, along with future community contributions, will help enable a more extensive exploration of this proposed new format.

[1]  Ulrich W. Kulisch,et al.  Arithmetic for vector processors , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[2]  John L. Gustafson,et al.  Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[3]  Krste Asanovic,et al.  A Hardware Accelerator for Computing an Exact Dot Product , 2017, 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH).

[4]  Zaid Al-Ars,et al.  Maximizing systolic array efficiency to accelerate the PairHMM Forward Algorithm , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  E.E. Swartzlander,et al.  Floating-Point Fused Multiply-Add Architectures , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[6]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[7]  Gustafson,et al.  Beating Floating Point at its Own Game , 2017 .

[8]  Wolfgang Rülling,et al.  Exact accumulation of floating-point numbers , 1991, IEEE Symposium on Computer Arithmetic.