Precision Adaptation for Fast and Accurate Polynomial Evaluation Generation

Polynomial evaluation is a critical part of the efficient floating-point approximation of elementary functions, in software as well as in FPGA-based systems. Designing an optimized polynomial evaluation scheme is a complex and tedious task, due to multitudes of choices in numerous dimensions: the evaluation scheme, like Horner or Estrin, needs to be selected based on implementation goals (latency, throughput, accuracy. . . ) and be adapted to a given architecture, for example by adapting the level of parallelism to the architecture capabilities. For each operation, a fixed-point or floating-point format needs to be chosen, e.g. between formats such as binary32, binary64. Furthermore some schemes and formats induce compromises, in particular when it comes to vectorized evaluation schemes. As part of a longer automated code generation toolchain, polynomial evaluation gains to be used repeatedly. Several aspects of polynomial evaluation have been presented before, such as code generation for Horner schemes with floating-point expansions or optimization of polynomial evaluation schemes. In this work we study both combination and extension of these techniques, striving for their integration in a code generator. In particular, we present an algorithm within the Metalibm-ludgdunum code generation framework, based on input by Metalibm-lutetia. Our intent is to offer state of the art multi-word evaluation with polynomial scheme space exploration with CGPE, Gappa correctness proof and advanced code generation, suited for High-Level Synthesis.