Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
暂无分享,去创建一个
[1] Edwin Hsing-Mean Sha,et al. Loop scheduling and partitions for hiding memory latencies , 1999, Proceedings 12th International Symposium on System Synthesis.
[2] Nader Bagherzadeh,et al. Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors , 1998, IEEE Trans. Parallel Distributed Syst..
[3] Edwin Hsing-Mean Sha,et al. Iterational retiming: maximize iteration-level parallelism for nested loops , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).
[4] Srivaths Ravi,et al. High-level synthesis of distributed logic-memory architectures , 2002, ICCAD 2002.
[5] Shlomit S. Pinter,et al. Tango: a hardware-based data prefetching technique for superscalar processors , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[6] Edwin Hsing-Mean Sha,et al. Optimizing Overall Loop Schedules Using Prefetching and Partitioning , 2000, IEEE Trans. Parallel Distributed Syst..
[7] Kai Li,et al. Thread scheduling for cache locality , 1996, ASPLOS VII.
[8] Edwin Hsing-Mean Sha,et al. Partitioning and Scheduling DSP Applications with Maximal Memory Access Hiding , 2002, EURASIP J. Adv. Signal Process..
[9] Naraig Manjikian,et al. Combining loop fusion with prefetching on shared-memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[10] Ricardo Bianchini,et al. Data prefetching for software DSMs , 1998, ICS '98.
[11] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[12] Seung Ryoul Maeng,et al. An adaptive sequential prefetching scheme in shared-memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[13] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[14] Charles E. Leiserson,et al. Retiming synchronous circuitry , 1988, Algorithmica.
[15] Michel Dubois,et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[16] Yoji Yamada,et al. Data relocation and prefetching for programs with large data sets , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Edwin Hsing-Mean Sha,et al. Rotation scheduling: a loop pipelining algorithm , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[18] T. Ozawa,et al. Cache miss heuristics and preloading techniques for general-purpose programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[19] Edwin Hsing-Mean Sha,et al. Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.
[20] Edwin Hsing-Mean Sha,et al. Scheduling and partitioning for multiple loop nests , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).
[21] Edwin Hsing-Mean Sha,et al. Scheduling of uniform multidimensional systems under resource constraints , 1998, IEEE Trans. Very Large Scale Integr. Syst..
[22] Michel Dubois,et al. Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[23] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[24] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[25] Mikko H. Lipasti,et al. Cache miss heuristics and preloading techniques for general-purpose programs , 1995, MICRO 28.
[26] Anant Agarwal,et al. Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..