Shared-memory multiprocessing: Current state and future directions
暂无分享,去创建一个
Erik Hagersten | Margaret Martonosi | Per Stenström | David J. Lilja | Madan Venugopal | P. Stenström | D. Lilja | M. Martonosi | Erik Hagersten | M. Venugopal
[1] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[2] Mats Brorsson,et al. An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.
[3] Josep Torrellas,et al. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[4] Erik Hagersten,et al. Simple COMA node implementations , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[5] Per Stenström,et al. Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols , 1996, TOPL.
[6] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[7] John K. Ousterhout. Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.
[8] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[9] Ruben W. Castelino,et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..
[10] Anoop Gupta,et al. Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..
[11] Alan L. Cox,et al. Software versus hardware shared-memory implementation: a case study , 1994, ISCA '94.
[12] Josep Torrellas,et al. The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[13] Margaret Martonosi,et al. Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.
[14] Barton P. Miller,et al. IPS-2: The Second Generation of a Parallel Program Measurement System , 1990, IEEE Trans. Parallel Distributed Syst..
[15] Allan Gottlieb. Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, Australia, May 1992 , 1992, ISCA.
[16] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[17] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[18] Stein Gjessing,et al. Distributed-directory scheme: scalable coherent interface , 1990, Computer.
[19] Kemal Ebcioglu,et al. An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.
[20] David J. Lilja,et al. Coarse-grained speculative execution in shared-memory multiprocessors , 1998, ICS '98.
[21] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[22] R.E. Johnson,et al. Evaluation of Multithreaded Uniprocessors for Commercial Application Environments , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[23] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[24] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.
[25] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[26] B.P. Miller. DPM: A Measurement System for Distributed Programs , 1988, IEEE Trans. Computers.
[27] Håkan Grahn,et al. SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.
[28] Yale N. Patt,et al. One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.
[29] R. Sarnath,et al. Proceedings of the International Conference on Parallel Processing , 1992 .
[30] Jenn-Yuan Tsai,et al. The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[31] Erik Hagersten,et al. DDM - A Cache-Only Memory Architecture , 1992, Computer.
[32] Mary K. Vernon,et al. The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.
[33] Mark D. Hill,et al. Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.
[34] Andrew Gilliam Tucker,et al. Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .
[35] Per Stenström,et al. Reducing Contention in Sharde-Memory Multiprocessors , 1988, Computer.
[36] Mary W. Hall,et al. Interprocedural Parallelization Analysis: A Case Study , 1995, PPSC.
[37] Paul Fischer,et al. A commercial CFD application on a shared memory multiprocessor using MPI , 1996 .
[38] Håkan Grahn,et al. Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection , 1996, J. Parallel Distributed Comput..
[39] Anoop Gupta,et al. Working sets, cache sizes, and node granularity issues for large-scale multiprocessors , 1993, ISCA '93.
[40] Jack J. Dongarra,et al. Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.
[41] David A. Wood,et al. Multicast snooping: a new coherence method using a multicast address network , 1999, ISCA.
[42] Monica S. Lam,et al. Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.
[43] H. Grahn,et al. Efficient strategies for software-only directory protocols in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[44] V. K. Naik,et al. Performance analysis of job scheduling policies in parallel supercomputing environments , 1993, Supercomputing '93.
[45] Nawaf Bitar,et al. A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX , 1995, JSSPP.
[46] Michel Dubois,et al. Boosting the Performance of Shared Memory Multiprocessors , 1997, Computer.
[47] Robert P. Colwell,et al. A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.
[48] Josep Torrellas,et al. Reducing remote conflict misses: NUMA with remote cache versus COMA , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[49] Anoop Gupta,et al. Comparative performance evaluation of cache-coherent NUMA and COMA architectures , 1992, ISCA '92.
[50] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[51] Steven Brawer,et al. An Introduction to Parallel Programming , 1989 .
[52] Kevin O'Brien,et al. Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.
[53] Per Stenström,et al. A Survey of Cache Coherence Schemes for Multiprocessors , 1990, Computer.
[54] James H. Patterson,et al. Portable Programs for Parallel Processors , 1987 .
[55] Andrea C. Arpaci-Dusseau,et al. Searching for the sorting record: experiences in tuning NOW-Sort , 1998, SPDT '98.
[56] Alan E. Charlesworth,et al. Starfire: extending the SMP envelope , 1998, IEEE Micro.
[57] David J. Lilja,et al. Complexity and performance in parallel programming languages , 1997, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments.
[58] Pen-Chung Yew,et al. A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..
[59] Anoop Gupta,et al. Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.
[60] T. Brewer,et al. The evolution of the HP/Convex Exemplar , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.
[61] Ken Kennedy,et al. The parascope editor: an interactive parallel programming tool , 1993, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[62] Margaret Martonosi,et al. Characterizing the Memory Behavior of Compiler-Parallelized Applications , 1996, IEEE Trans. Parallel Distributed Syst..
[63] Zary Segall,et al. Visualizing performance debugging , 1989, Computer.
[64] Kozo Kimura,et al. An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.
[65] Kunle Olukotun,et al. A Single-Chip Multiprocessor , 1997, Computer.
[66] Per Stenström,et al. The Scalable Tree Protocol-a cache coherence approach for large-scale multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.
[67] Sanjay Sharma,et al. Impact of Loop Granularity and Self-Preemption on the Performance of Loop Parallel Applications on a Multiprogrammed Shared-Memory Multiprocessor , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[68] T. Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[69] Margaret Martonosi,et al. Performance monitoring in a Myrinet-connected SHRIMP cluster , 1998, SPDT '98.
[70] James E. Smith,et al. Trace Processors: Moving to Fourth-Generation Microarchitectures , 1997, Computer.
[71] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[72] Josep Torrellas,et al. An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.
[73] Per Stenström,et al. A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[74] Luiz André Barroso,et al. Memory system characterization of commercial workloads , 1998, ISCA.
[75] Alan Jay Smith. Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, CA, USA, May 1993 , 1993, ISCA.
[76] Joe Throop. OpenMP: Shared-Memory Parallelism From the Ashes , 1999, Computer.
[77] Barr E. Bauer. Practical parallel programming , 1992 .
[78] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[79] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[80] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[81] Jian Huang,et al. The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.
[82] Lawrence Rauchwerger,et al. The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization , 1994, ICS '94.
[83] CORPORATE Ncube. The NCUBE family of high-performance parallel computer systems , 1988, C3P.
[84] W. E Nagel. 1988 International conference on supercomputing , 1988 .
[85] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[86] David J. Lilja,et al. Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons , 1993, CSUR.
[87] Margaret Martonosi,et al. Adaptive parallelism in compiler‐parallelized code , 1998 .
[88] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[89] Donald Yeung,et al. The MIT Alewife machine: architecture and performance , 1995, ISCA '98.
[90] Maurice J. Bach. The Design of the UNIX Operating System , 1986 .
[91] Rajeev Barua,et al. Maps: a compiler-managed memory system for raw machines , 1999, ISCA.
[92] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[93] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[94] Rudolf Eigenmann,et al. Parallel programming with message passing and directives , 2001, Comput. Sci. Eng..
[95] Per Stenström,et al. The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors , 1993, [1993] Proceedings 26th Annual Simulation Symposium.
[96] David A. Patterson,et al. Proceedings of the 22nd annual international symposium on Computer architecture , 1995, ISCA.
[97] Richard J. Enbody,et al. Automatic Self-Allocating Threads (ASAT) on the Convex Exemplar , 1995, ICPP.
[98] David J. Lilja,et al. Efficient execution of parallel applications in multiprogrammed multiprocessor systems , 1996, Proceedings of International Conference on Parallel Processing.
[99] Robert J. Fowler,et al. Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.
[100] Margaret Martonosi,et al. Informing memory operations: memory performance feedback mechanisms and their applications , 1998, TOCS.
[101] Anoop Gupta,et al. Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.
[102] Anthony Skjellum,et al. Using MPI - portable parallel programming with the message-parsing interface , 1994 .
[103] Daniel E. Lenoski,et al. Scalable Shared-Memory Multiprocessing , 1995 .
[104] Joel H. Saltz,et al. Resource‐aware metacomputing , 1997 .
[105] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[106] Mateo Valero,et al. Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[107] Shashi Shekhar,et al. Parallelizing a GIS on a Shared Address Space Architecture , 1996, Computer.
[108] Anoop Gupta,et al. Competitive management of distributed shared memory , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.
[109] Clifford C. Huff,et al. Elements of a realistic CASE tool adoption budget , 1992, CACM.
[110] Yale N. Patt,et al. A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.
[111] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.