Static reuse distances for locality-based optimizations in MATLAB

The problem of modeling memory locality of applications to guide compiler optimizations in a systematic manner is an important unsolved problem, made even more significant with the advent of multi-core and many-core architectures. We describe an approach based on a novel source-level metric, called static reuse distance, to model the memory behavior of applications written in matlab. We use matlab as a representative language that lets end-users express their algorithms precisely, but at a relatively high level. Matlab's "high-level" characteristics allow the static analysis to focus on large objects, such as arrays, without losing accuracy due to processor-specific layout of scalar values in memory. We present an efficient algorithm to compute static reuse distances using an extended version of dependence graphs. Our approach differs from earlier similar attempts in three important aspects: it targets high-level programming systems characterized by heavy use of libraries; it works on full programs, instead of being confined to loops; and it integrates practical mechanisms to handle separately compiled procedures as well as pre-compiled library procedures that are only available in binary form. We study matlab code, taken from real programs, to demonstrate the effectiveness of our model. Finally, we present some applications of our approach to program transformations that are known to be important in matlab, but are expected to be relevant to other similar high level languages as well.

[1]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[2]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[3]  David A. Padua,et al.  Estimating cache misses and locality using stack distances , 2003, ICS '03.

[4]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI.

[5]  Keshav Pingali,et al.  How much parallelism is there in irregular applications? , 2009, PPoPP '09.

[6]  Arun Chauhan,et al.  Compile-time disambiguation of MATLAB types through concrete interpretation with automatic run-time fallback , 2008, 2009 International Conference on High Performance Computing (HiPC).

[7]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Catherine C. McGeoch Experimental algorithmics , 2007, CACM.

[9]  Kristof Beyls,et al.  Refactoring for Data Locality , 2009, Computer.

[10]  Keshav Pingali,et al.  An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.

[11]  José Nelson Amaral,et al.  A Dimension Abstraction Approach to Vectorization in Matlab , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[12]  Ken Kennedy,et al.  Reducing and Vectorizing Procedures for Telescoping Languages , 2004, International Journal of Parallel Programming.

[13]  Ken Kennedy,et al.  The memory of bandwidth bottleneck and its amelioration by a compiler , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[14]  Ken Kennedy,et al.  Dependence Analysis of Fortran90 Array Syntax , 1996, PDPTA.

[15]  Elizabeth Charnock On the web. , 2012, Nursing children and young people.

[16]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[17]  David S. Wise,et al.  Representation-transparent matrix algorithms with scalable performance , 2007, ICS '07.

[18]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[19]  Philip S. Abrams,et al.  An APL machine , 1970 .

[20]  David A. Padua,et al.  Techniques for the translation of MATLAB programs into Fortran 90 , 1999, TOPL.

[21]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.