Matrix calculations with SIMD floating point instructions on x 86 processors

 This paper describes and evaluates the use of SIMD floating point instructions for scientific calculations. The performance of these instructions is compared with ordinary floating point code. Implementation concerns, the effects of loop unroll as well as matrix size variations are analyzed. Execution speeds are compared using matrix multiplication. The intrinsic incompatibility of the SIMD floating point implementations used by different manufacturers requires the use of two different instruction sets: 3DNOW! on the AMD K6 processor and the Streaming-SIMD Extensions (SSE) on the Intel Pentium III processor. Keywords SIMD, 3DNOW!, SSE, vector operations, performance evaluation.