Parallel Communication Mechanisms for Sparse, Irregular Applications
暂无分享,去创建一个
[1] T. von Eicken,et al. Parallel programming in Split-C , 1993, Supercomputing '93.
[2] William J. Dally,et al. The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[3] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[4] Brian W. Kernighan,et al. An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..
[5] John L. Hennessy,et al. The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.
[6] J. Meijerink,et al. An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .
[7] James R. Larus,et al. Application-specific protocols for user-level shared memory , 1994, Proceedings of Supercomputing '94.
[8] T. W. Mathews,et al. Analysis of performance accelerator running ETMSP. Final report , 1993 .
[9] Duncan Roweth. The Meiko CS-2 system architecture , 1993, SPAA '93.
[10] Donald Yeung,et al. The MIT Alewife machine: architecture and performance , 1995, ISCA '98.
[11] Wilson C. Hsieh,et al. Optimistic active messages: a mechanism for scheduling communication with computation , 1995, PPOPP '95.
[12] James R. Larus,et al. Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.
[13] Shubhendu S. Mukherjee,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[14] I. Gustafsson. On modified incomplete cholesky factorization methods for the solution of problems with mixed boundary conditions and problems with discontinuous material conefficients , 1979 .
[15] Seth Copen Goldstein,et al. Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 , 1993, ISCA '93.
[16] Remzi H. Arpaci-Dusseau,et al. Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[17] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[18] David E. Culler,et al. Assessing the benefits of fine-grain parallelism in dataflow programs , 1988, Proceedings. SUPERCOMPUTING '88.
[19] Michael D. Noakes,et al. The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.
[20] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[21] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[22] Anant Agarwal,et al. Anatomy of a message in the Alewife multiprocessor , 1993, ICS '93.
[23] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[24] Ricardo Bianchini,et al. Application Performance on the MIT Alewife Multiprocessor , 1996 .
[25] Eric A. Brewer,et al. Remote queues: exposing message queues for optimization and atomicity , 1995, SPAA '95.
[26] Shahid H. Bokhari,et al. A Partitioning Strategy for PDEs Across Multiprocessors , 1985, ICPP.
[27] Steven A. Moyer,et al. Performance of the IPSC/860 Node Architecture , 1991 .
[28] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[29] T. H. Dunigan. Communication performance of the Intel Touchstone DELTA mesh , 1992 .
[30] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[31] Fernando L. Alvarado,et al. Optimal Parallel Solution of Sparse Triangular Systems , 1993, SIAM J. Sci. Comput..
[32] J. M. Aarden,et al. Preconditioned CG-type methods for solving the coupled system of fundamental semiconductor equations , 1989 .
[33] Mark T. Jones,et al. BlockSolve v1. 1: Scalable Library Software for the Parallel Solution of Sparse Linear Systems , 1993 .
[34] Frederic T. Chong,et al. METRO: a router architecture for high-performance, short-haul routing networks , 1994, ISCA '94.
[35] Evangelos P. Markatos,et al. Shared memory vs. message passing in shared-memory multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.
[36] Anant Agarwal,et al. FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor , 1994 .
[37] Eric A. Brewer,et al. How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.
[38] R. Schreiber,et al. Highly Parallel Sparse Triangular Solution , 1994 .
[39] Lawrence Snyder,et al. A Comparison of Programming Models for Shared Memory Multiprocessors , 1990, ICPP.
[40] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[41] Chris J. Scheiman,et al. Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.
[42] Shreekant S. Thakkar,et al. The Symmetry Multiprocessor System , 1988, ICPP.
[43] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.
[44] M. J. Beckerle,et al. T: integrated building blocks for parallel computing , 1993, Supercomputing '93.
[45] N. Madsen. Divergence preserving discrete surface integral methods for Maxwell's curl equations using non-orthogonal unstructured grids , 1995 .
[46] I. Duff,et al. The effect of ordering on preconditioned conjugate gradients , 1989 .
[47] Thomas H. Dunigan. KENDALL SQUARE MULTIPROCESSOR: EARLY EXPERIENCES AND PERFORMANCE , 1992 .
[48] Owe Axelsson,et al. A survey of preconditioned iterative methods for linear systems of algebraic equations , 1985 .
[49] Anoop Gupta,et al. The performance impact of flexibility in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.
[50] S.K. Reinhardt,et al. Decoupled Hardware Support for Distributed Shared Memory , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[51] Robert,et al. Parallel Sparse Triangular Solution with Partitioned Inverses andPrescheduled , 1995 .
[52] James R. Larus,et al. Where is time spent in message-passing and shared-memory programs? , 1994, ASPLOS VI.
[53] Anoop Gupta,et al. An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93. Proceedings.
[54] Donald Yeung,et al. Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient , 1993, PPOPP '93.
[55] Margaret Martonosi,et al. Tradeoffs in Message Passing and Shared Memory Implementations of a Standard Cell Router , 1989, ICPP.
[56] Bruce Hendrickson,et al. The Chaco user`s guide. Version 1.0 , 1993 .