Mesa: automatic generation of lookup table optimizations

Scientific programmers strive constantly to meet performance demands. Tuning is often done manually, despite the significant development time and effort required. One example is lookup table (LUT) optimization, a technique that is generally applied by hand due to a lack of methodology and tools. LUT methods reduce execution time by replacing computations with memory accesses to precomputed tables of results. LUT optimizations improve performance when the memory access is faster than the original computation, and the level of reuse is sufficient to amortize LUT initialization. Current practice requires programmers to inspect program source to identify candidate expressions, then develop specific LUT code for each optimization. Measurement of LUT accuracy is usually ad hoc, and the interaction with multicore parallelization has not been explored. In this paper we present Mesa, a standalone tool that implements error analysis and code generation to improve the process of LUT optimization. We evaluate Mesa on a multicore system using a molecular biology application and other scientific expressions. Our LUT optimizations realize a performance improvement of 5X for the application and up to 45X for the expressions, while tightly controlling error. We also show that the serial optimization is just as effective on a parallel version of the application. Our research provides a methodology and tool for incorporating LUT optimizations into existing scientific code

[1]  M. Valero,et al.  Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.

[2]  Gene M. Amdahl,et al.  Computer Architecture and Amdahl's Law , 2007, Computer.

[3]  Mark Grand,et al.  Patterns in Java, Volume 2 , 1998 .

[4]  Lawrence G. Votta,et al.  Can software engineering solve the HPCS problem? , 2005, SE-HPCS '05.

[5]  Lawrence G. Votta,et al.  Software Productivity Research In High Performance Computing , 2006 .

[6]  Randall Hyde,et al.  The Art of Assembly Language , 2003 .

[7]  Carlos Alvarez,et al.  Initial Results on Fuzzy Floating Point Computation for Multimedia Processors , 2002, IEEE Computer Architecture Letters.

[8]  Mahmut T. Kandemir,et al.  A special-purpose compiler for look-up table and code generation for function evaluation , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[9]  Earl E. Swartzlander,et al.  Hardware Designs for Exactly Rounded Elemantary Functions , 1994, IEEE Trans. Computers.

[10]  D. Quinlan,et al.  Tool Support for Inspecting the Code Quality of HPC Applications , 2007, Third International Workshop on Software Engineering for High Performance Computing Applications (SE-HPC '07).

[11]  Ping Tak Peter Tang,et al.  Table-lookup algorithms for elementary functions and their error analysis , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[12]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[13]  Guojing Cong,et al.  Towards a framework for automated performance tuning , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[14]  Stamatis Vassiliadis,et al.  Table driven Newton scheme for high precision logarithm generation , 1994 .

[15]  Richard P. Brent,et al.  Fast Multiple-Precision Evaluation of Elementary Functions , 1976, JACM.

[16]  Gene M. Amdahl Computer Architecture and Amdahl's Law , 2007 .

[17]  Paul McNamee,et al.  Developing a tool for memoizing functions in C++ , 1998, SIGP.