Analysis of Reciprocal and Square Root Reciprocal Instructions in the AMD K6-2 Implementation of 3DNow!

Abstract Reciprocal and root reciprocal functions at “half” and IEEE single precision formats are specified in the AMD 3DNow! instruction set. Implementations in the recently released AMD K6-2 microprocessor are analyzed herein by exhaustive computation and timing loops to ascertain the accuracy and monotonicity properties of the output and throughput/latency cycle counts. Periodicities in stepwise function output were observed and employed to construct an underlying bipartite table that can serve as the core of the respective reciprocal function outputs. The recommended RISC instruction macros generated single precision reciprocals and root reciprocals accurate to a unit in the last place. However, the root reciprocal functions failed to satisfy the desirable monotonicity property typically implemented as an industry standard for elementary functions on x86 floating point units. Reasons for the failure are provided and an adjusted table is shown to satisfy the monotonicity standard. Results are summarized in Table 1 and described in the body of this report.

[1]  Debjit Das Sarma,et al.  Faithful bipartite ROM reciprocal tables , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[2]  Debjit Das Sarma,et al.  Measuring the Accuracy of ROM Reciprocal Tables , 1994, IEEE Trans. Computers.

[3]  Michael J. Schulte,et al.  Symmetric bipartite tables for accurate function approximation , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.