SIMDizing pairwise sums: a summation algorithm balancing accuracy with throughput
暂无分享,去创建一个
Implementing summation when accuracy and throughput need to be balanced is a challenging endevour. We present experimental results that provide a sense when to start worrying and the expense of the various solutions that exist. We also present a new algorithm based on pairwise summation that achieves 89% of the throughput of the fastest summation algorithms when the data is not resident in L1 cache while eclipsing the accuracy of signifigantly slower compensated sums like Kahan summation and Kahan-Babuska that are typically used when accuracy is important.
[1] William Kahan,et al. Pracniques: further remarks on reducing truncation errors , 1965, CACM.
[2] Tommy Färnqvist. Number Theory Meets Cache Locality – Efficient Implementation of a Small Prime FFT for the GNU Multiple Precision Arithmetic Library , 2005 .
[3] Nicholas J. Higham,et al. Accuracy and stability of numerical algorithms, Second Edition , 2002 .
[4] A. Neumaier. Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen , 1974 .