The performance impact analysis of loop unrolling
暂无分享,去创建一个
[1] J.L. Ayala,et al. Optimal loop-unrolling mechanisms and architectural extensions for an energy-efficient design of shared register files in MPSoCs , 2005, Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05).
[2] Alexander Aiken,et al. Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.
[3] Mark Stephenson,et al. Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.
[4] Preeti Ranjan Panda,et al. The Impact of Loop Unrolling on Controller Delay in High Level Synthesis , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.
[5] J. C. Huang,et al. Generalized loop-unrolling: a method for program speedup , 1999, Proceedings 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology. ASSET'99 (Cat. No.PR00122).
[6] Paul Lokuciejewski,et al. Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization , 2009, 2009 21st Euromicro Conference on Real-Time Systems.
[7] Markus Kowarschik,et al. An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms , 2002, Algorithms for Memory Hierarchies.
[8] Alexandru Nicolau,et al. Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..
[9] Yong Dou,et al. Impact of Loop Unrolling on Area, Throughput and Clock Frequency for Window Operations Based on a Data Schedule Method , 2008, 2008 Congress on Image and Signal Processing.
[10] V. Strassen. Gaussian elimination is not optimal , 1969 .
[11] Sasko Ristov,et al. Some optimization techniques of the matrix multiplication algorithm , 2013, Proceedings of the ITI 2013 35th International Conference on Information Technology Interfaces.
[12] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[13] P. Sadayappan,et al. Optimal loop unrolling for GPGPU programs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[14] Todor Stefanov,et al. Optimal Loop Unrolling and Shifting for Reconfigurable Architectures , 2009, TRETS.
[15] Dean M. Tullsen,et al. The effect of compiler optimizations on Pentium 4 power consumption , 2003, Seventh Workshop on Interaction Between Compilers and Computer Architectures, 2003. INTERACT-7 2003. Proceedings..
[16] Philip H. Sweany,et al. Optimizing loop performance for clustered VLIW architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[17] Sasko Ristov,et al. Hybrid 2D/1D Blocking as Optimal Matrix-Matrix Multiplication , 2012, ICT Innovations.
[18] A. Jefferson Offutt,et al. Using compiler optimization techniques to detect equivalent mutants , 1994, Softw. Test. Verification Reliab..
[19] Sasko Ristov,et al. Matrix multiplication performance analysis in virtualized shared memory multiprocessor , 2012, 2012 Proceedings of the 35th International Convention MIPRO.
[20] Sasko Ristov,et al. Affinity-aware HPC applications in multichip and multicore multiprocessor , 2013, Proceedings of the ITI 2013 35th International Conference on Information Technology Interfaces.
[21] Geng Liu,et al. Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems , 2012, Computer.
[22] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[23] S. Hiroyuki,et al. Characteristics of loop unrolling effect: software pipelining and memory latency hiding , 2001, 2001 Innovative Architecture for Future Generation High-Performance Processors and Systems.
[24] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.
[25] Sasko Ristov,et al. Loosely or tightly coupled affinity for matrix - Vector multiplication , 2013, 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).
[26] Koen Bertels,et al. Loop distribution for K-loops on Reconfigurable Architectures , 2011, 2011 Design, Automation & Test in Europe.
[27] Vania Marangozova-Martin,et al. BOAST: Bringing Optimization through Automatic Source-to-Source Transformations , 2013, 2013 IEEE 7th International Symposium on Embedded Multicore Socs.
[28] Peter Luksch,et al. An Improving Method for Loop Unrolling , 2013, ArXiv.
[29] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[30] M. Gusev,et al. Achieving maximum performance for matrix multiplication using set associative cache , 2012, 2012 8th International Conference on Computing Technology and Information Management (NCM and ICNIT).