Classifying and alleviating the communication overheads in matrix computations on large-scale NUMA multiprocessors
暂无分享,去创建一个
[1] L.M. Ni,et al. Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..
[2] Ricardo Bianchini,et al. The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[3] Guy Lemieux,et al. The NUMAchine multiprocessor , 2000, Proceedings 2000 International Conference on Parallel Processing.
[4] Ricardo Bianchini,et al. Software interleaving , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.
[5] Yi-Min Wang,et al. Clustered affinity scheduling on large-scale NUMA multiprocessors , 1997, J. Syst. Softw..
[6] J. Ramanujam,et al. Tiling of Iteration Spaces for Multicomputers , 1990, ICPP.
[7] Yi-Min Wang,et al. A Minimal Synchronization Overhead Affinity Scheduling Algorithm for Shared-Memory Multiprocessors , 1995, Int. J. High Speed Comput..
[8] Evangelos P. Markatos,et al. Shared memory vs. message passing in shared-memory multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.
[9] Hui Li,et al. Locality and Loop Scheduling on NUMA Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[10] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[11] Jack E. Veenstra,et al. Mint Tutorial and User Manual , 1993 .
[12] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[13] E. P. Markatos,et al. Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance , 1992 .
[14] Robert J. Fowler,et al. MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[15] Multiprocessors. Using Processor A � nity in Loop Scheduling on Shared Memory , 1994 .
[16] Michael Stumm,et al. Hector - A Hierarchically Structured , 1991 .
[17] Anoop Gupta,et al. The DASH prototype: implementation and performance , 1992, ISCA '92.
[18] Mateo Valero,et al. Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[19] Kenneth C. Sevcik,et al. Hot spot analysis in large scale shared memory multiprocessors , 1993, Supercomputing '93. Proceedings.
[20] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.
[21] Cezary Dubnicki. The effects of block size on the performance of coherent caches in shared-memory multiprocessors , 1993 .