The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)
暂无分享,去创建一个
[1] Lawrence Rauchwerger,et al. The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization , 1994, ICS '94.
[2] Lawrence Rauchwerger,et al. Effective Automatic Parallelization with Polaris , 1995 .
[3] P. Sadayappan,et al. An approach to synchronization for parallel computing , 1988, ICS '88.
[4] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[5] Josep Torrellas,et al. Data forwarding in scalable shared-memory multiprocessors , 1995, ICS '95.
[6] David A. Padua,et al. Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.
[7] Josep Torrellas,et al. Data Forwarding in Scalable Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..
[8] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[9] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[10] Josep Torrellas,et al. Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching , 1995, ISCA.
[11] John Zahorjan,et al. Improving the performance of runtime parallelization , 1993, PPOPP '93.
[12] Andreas Nowatzyk,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, ISCA.
[13] Anoop Gupta,et al. Comparative performance evaluation of cache-coherent NUMA and COMA architectures , 1992, ISCA '92.
[14] Pen-Chung Yew,et al. Data Prefetching and Data Forwarding in Shared Memory Multiprocessors , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[15] Josep Torrellas,et al. Optimizing instruction cache performance for operating system intensive workloads , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[16] Josep Torrellas,et al. The Augmint multiprocessor simulation toolkit for Intel x86 architectures , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[17] Ding-Kai Chen,et al. An Eecient Algorithm for the Run-time Parallelization of Doacross Loops 1 , 1994 .
[18] Peter M. Kogge,et al. EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[19] Jim Gray,et al. Advantages of COMA , 1995 .
[20] Erik Hagersten,et al. DDM - A Cache-Only Memory Architecture , 1992, Computer.
[21] Chuan-Qi Zhu,et al. A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Transactions on Software Engineering.
[22] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[23] Josep Torrellas,et al. An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.
[24] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[25] Josep Torrellas,et al. Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses , 1996, ISCA.
[26] Anoop Gupta,et al. Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.
[27] John B. Carter,et al. An argument for simple COMA , 1995, Future Gener. Comput. Syst..
[28] Anders Landin,et al. Bus-based COMA-reducing traffic in shared-bus multiprocessors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.