Optimizing loop performance for clustered VLIW architectures
暂无分享,去创建一个
Philip H. Sweany | Yi Qian | Steve Carr | S. Carr | P. Sweany | Yi Qian
[1] D.A. Reed,et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[2] Antonio González,et al. Graph-partitioning based instruction scheduling for clustered processors , 2001, MICRO.
[3] Philip H. Sweany,et al. Improving software pipelining with unroll-and-jam , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.
[4] David A. Poplawski. The unlimited resource machine (urm) , 1995 .
[5] A. Gonzalez,et al. Graph-partitioning based instruction scheduling for clustered processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[6] Philip H. Sweany,et al. Register assignment for software pipelining with partitioned register banks , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[7] Ken Kennedy,et al. RETROSPECTIVE: Coloring Heuristics for Register Allocation , 2022 .
[8] Javier Zalamea,et al. Modulo scheduling with integrated register spilling for clustered VLIW architectures , 2001, MICRO.
[9] Alexandre E. Eichenberger,et al. Effective cluster assignment for modulo scheduling , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[10] F. Jesús Sánchez Navarro,et al. Instruction scheduling for clustered VLIW architectures , 2000 .
[11] Antonio González,et al. The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures , 2000, Proceedings 2000 International Conference on Parallel Processing.
[12] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[13] Hewlett-Packard,et al. Iterative Modulo Scheduling : An Algorithm For Software , 1997 .
[14] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[15] Philip H. Sweany,et al. Value cloning for architectures with partitioned register banks , 1998 .
[16] Nikil D. Dutt,et al. Partitioned register files for VLIWs: a preliminary analysis of tradeoffs , 1992, MICRO 25.
[17] Ken Kennedy,et al. Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..
[18] Yi Qian,et al. Loop transformations for clustered vliw architectures , 2002 .
[19] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[20] Philip H. Sweany,et al. Loop Transformations for Architectures with Partitioned Register Banks , 2001, OM '01.
[21] Antonio González,et al. A unified modulo scheduling and register allocation technique for clustered processors , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.