Software floating-point computation on parallel mahcines

This thesis examines the ability to optimize the performance of software floating-point (FP) operations on parallel architectures. In particular, instruction level parallelism (ILP) of FP operations is explored, optimization techniques are proposed, and efficient algorithms are developed. In our method, FP operations such as FP add, are decomposed into a set of primitive integer and logic operations, such as integer adds and shifts, and the primitive operations are then scheduled on a parallel architecture. The algorithms for fast division and square root computation also enable the hardware FP unit to be clocked at a faster rate. The design and analysis of such a system is detailed and is tested on Raw, a software-exposed parallel architecture. Results show that division and square root implementations achieve reasonable performance compared to a hardware FP unit. Thesis Supervisor: Anant Agarwal Title: Professor