Program Composition and Optimization: An Introduction

Software composition connects separately defined software artifacts. Such connection may be in program structure (such as inheritance), data flow (such as message passing) and/or control flow (such as function calls or loop control).

[1]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[3]  M. Puschel,et al.  FFT Program Generation for Shared Memory: SMP and Multicore , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[4]  Franz Franchetti,et al.  A Rewriting System for the Vectorization of Signal Transforms , 2006, VECPAR.

[5]  Christoph W. Kessler,et al.  Optimized composition of performance‐aware parallel components , 2012, Concurr. Comput. Pract. Exp..

[6]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[7]  Christoph W. Kessler,et al.  A Framework for Performance-Aware Composition of Explicitly Parallel Components , 2007, PARCO.

[8]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[9]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[10]  Franz Franchetti,et al.  How to Write Fast Numerical Code: A Small Introduction , 2007, GTTSE.

[11]  Jesper Andersson,et al.  Composition and Optimization , 2008 .

[12]  Michael F. P. O'Boyle,et al.  Iterative Compilation , 2002, Embedded Processor Design Challenges.

[13]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[14]  David A. Padua,et al.  Programming for Locality and Parallelism with Hierarchically Tiled Arrays , 2003, LCPC.

[15]  Jesper Andersson,et al.  Profile-Guided Composition , 2008, SC@ETAPS.

[16]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[17]  Michael F. P. O'Boyle,et al.  Portable compiler optimisation across embedded programs and microarchitectures using machine learning , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Franz Franchetti,et al.  Formal datapath representation and manipulation for implementing DSP transforms , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[19]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[20]  Markus Püschel,et al.  Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.

[21]  David A. Padua,et al.  Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.

[22]  Welf Löwe,et al.  Foundations for the integration of scheduling techniques into compilers for parallel languages , 2005, Int. J. Comput. Sci. Eng..

[23]  Jesper Andersson,et al.  Reconfigurable Scientific Applications on GRID Services , 2005, EGC.

[24]  David A. Padua,et al.  A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[25]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[26]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[27]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[28]  Ed F. Deprettere,et al.  Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS , 2002 .