Techniques for the automatic debugging of scientific floating-point programs

Over the past several years the field of large-scale scientific applications has been growing rapidly. Consequently the anomalies in these kinds of application, anomalies that heretofore had a minor impact, may have today a significant impact on the numerical results of these programs and their implications [1]. The work presented here proposes automatic techniques to reduce the cost of locating and remedying a wide class of numerical nuisances arising in single and multi-threaded applications. As examples of common anomalies, let us cite rounding errors that can accumulate excessively all along a numerical program, conditional branches involving floating-point comparisons that may go astray because of the subtleties of floating-point arithmetic, anomalies due to vagaries of programming languages, overflow, benign and catastrophic cancellation, among others. When suspected, such anomalies can be located using various techniques: altering rounding modes of floating-point arithmetic hardware and observing the sensitivity of the program to those changes, increasing the precision of the calculations on some floating-point operations (by using high or even infinite precision) and observing the impact on the final result, modifying comparisons by adding an unobvious tolerance, or also using interval arithmetic, ... . These techniques vary in their costs, scopes, and effectiveness. Because anomalies due to roundoff are difficult to debug [2], we wish to offer developers whose expertise does not extend to numerical erroranalysis an intelligent tool to debug floating-point programs. This tool should help locate automatically and remedy suspected anomalies, working on source code and at runtime, to shorten debugging time and thus improve the productivity of programmers. Our tool embraces a set of transformations: increasing precision (by using double instead of single precision floating-point arithmetic), changing rounding mode, flipping between two implementations of the same computation, ... . These transformations are effected by instrumenting the original code with CIL [3], which allows a given C code to be analysed and transformed. Then the parts of a program that are most sensitive to this transformation can be isolated automatically using delta-debugging [4]. This algorithm works like a binary search to determine a locally minimal