A fast and accurate method for determining a lower bound on execution time
暂无分享,去创建一个
Michael F. P. O'Boyle | Grigori Fursin | Olivier Temam | G. Watts | G. Fursin | O. Temam | M. O’Boyle | Gregory Watts
[1] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[2] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.
[3] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[4] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[5] Scott Devine,et al. Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.
[6] Olivier Temam,et al. Investigating optimal local memory performance , 1998, ASPLOS VIII.
[7] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[8] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[9] Michael D. Smith,et al. Overcoming the Challenges to Feedback-Directed Optimization , 2000, Dynamo.
[10] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[11] Olivier Temam,et al. A quantitative analysis of loop nest locality , 1996, ASPLOS VII.
[12] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[13] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[14] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[15] Boleslaw K. Szymanski,et al. Program Optimization Based on Compile-Time Cache Performance Prediction , 1996, Parallel Process. Lett..
[16] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.
[17] P. Geoffrey Lowney,et al. Feedback directed optimization in Compaq's compilation tools for Alpha , 1999 .
[18] Markus Mock,et al. Calpa: atool for automating dynamic compilation , 1999 .
[19] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..
[20] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[21] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.
[22] Mateo Valero,et al. Static locality analysis for cache management , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[23] James R. Goodman,et al. The declining effectiveness of dynamic caching for general- purpose microprocessors , 1995 .
[24] Santosh G. Abraham,et al. Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.
[25] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.
[26] Chau-Wen Tseng,et al. Software Support For Improving Locality in Scientific Codes , 2001 .
[27] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[28] Doug Burger,et al. Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.