The Floating-Point Unit of the Jaguar x86 Core

The AMD Jaguar x86 core uses a fully-synthesized, 128-bit native floating-point unit (FPU) built as a co-processor model. The Jaguar FPU supports several x86 ISA extensions, including x87, MMX, SSE1 through SSE4.2, AES, CLMUL, AVX, and F16C instruction sets. The front end of the unit decodes two complex operations per cycle and uses a dedicated renamer (RN), free list (FL), and retire queue (RQ) for in-order dispatch and retire. The FPU issues to the execution units with a dedicated out-of-order, dual-issue scheduler. Execution units source operands from a synthesized physical register file (PRF) and bypass network. The back end of the unit has two execution pipes: the first pipe contains a vector integer ALU, a vector integer MUL unit, and a floating-point adder (FPA), the second pipe contains a vector integer ALU, a store-convert unit, and a floating-point iterative multiplier (FPM). The implementation of the unit focused on low-power design and on vectorized single-precision (SP) performance optimizations. The verification of the unit required complex pseudo-random and formal verification techniques. The Jaguar FPU is built in a 28nm CMOS process.

[1]  Michael J. Schulte,et al.  Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support , 2009, IEEE Transactions on Computers.

[2]  F. Weber,et al.  An out-of-order three-way superscalar multimedia floating-point unit , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[3]  Gabriel H. Loh,et al.  Matrix scheduler reloaded , 2007, ISCA '07.

[4]  Robert E Goldschmidt,et al.  Applications of division by convergence , 1964 .

[5]  Stuart F. Oberman,et al.  Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[6]  Peter-Michael Seidel,et al.  Formal Verification of an Iterative Low-Power x86 Floating-Point Multiplier with Redundant Feedback , 2011, ACL2.

[7]  Brad Burgess,et al.  Bobcat: AMD's Low-Power x86 Processor , 2011, IEEE Micro.

[8]  Teja Singh,et al.  Jaguar: A next-generation low-power x86-64 core , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.