Exploiting data locality in adaptive architectures

The speed of processors increases much faster than the memory access time. This makes memory accesses expensive. To meet this problem, cache hierarchies are introduced to serve the processor with d ...

[1]  David M. Koppelman Neighborhood prefetching on multiprocessors using instruction history , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[2]  D.A. Wood,et al.  Reactive NUMA: A Design For Unifying S-COMA And CC-NUMA , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[4]  Erik Hagersten Performance of a High-Accuracy PDE Solver on a Self Optimizing NUMA Architecture , 2003 .

[5]  Erik Hagersten,et al.  Toward Scalable Cache Only Memory Architectures , 2003 .

[6]  Dean M. Tullsen,et al.  Effective cache prefetching on bus-based multiprocessors , 1995, TOCS.

[7]  Zoran Radovic,et al.  Efficient synchronization and coherence for nonuniform communication architectures , 2003 .

[8]  Michel Dubois,et al.  Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[9]  Fredrik Edelvik Finite volume solvers for the Maxwell equations in time domain , 2000 .

[10]  Erik Hagersten,et al.  Memory system behavior of Java-based middleware , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[11]  Johan Steensland Efficient Partitioning of Dynamic Structured Grid Hierarchies , 2002 .

[12]  Jonathan Harris,et al.  Extending OpenMP For NUMA Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13]  Antony Jameson,et al.  How Many Steps are Required to Solve the Euler Equations of Steady, Compressible Flow: In Search of a Fast Solution Algorithm , 2001 .

[14]  Edward H. Gornish Adaptive and integrated data cache prefetching for shared-memory multiprocessors , 1994 .

[15]  Víctor Viñals,et al.  Hardware prefetching in bus-based multiprocessors: pattern characterization and cost-effective hardware , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[16]  Frederik Edelvik,et al.  Hybrid Solvers for the Maxwell Equations in Time-Domain , 2002 .

[17]  Per Stenström,et al.  Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[18]  Anoop Gupta,et al.  Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.

[19]  Livio Ricciulli,et al.  The detection and elimination of useless misses in multiprocessors , 1993, ISCA '93.

[20]  Erik Hagersten,et al.  Removing the overhead from software-based shared memory , 2001, SC '01.

[21]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[22]  Martin Karlsson,et al.  Cache memory design trade-offs for current and emerging workloads , 2003 .

[23]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[24]  Erik Hagersten,et al.  Miss penalty reduction using bundled capacity prefetching in multiprocessors , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[25]  Erik Hagersten,et al.  WildFire: a scalable path for SMPs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[26]  James R. Goodman Using cache memory to reduce processor-memory traffic , 1998, ISCA '98.

[27]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[28]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[29]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[30]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[31]  Sverker Holmgren,et al.  Convergence acceleration for the steady state Euler equations , 2003 .

[32]  Claude Leforestier,et al.  A comparison of different propagation schemes for the time dependent Schro¨dinger equation , 1991 .

[33]  A. Charlesworth The Sun Fireplane System Interconnect , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[34]  Eduard Ayguadé,et al.  Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[35]  Seung Ryoul Maeng,et al.  An adaptive sequential prefetching scheme in shared-memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[36]  Milo M. K. Martin,et al.  Token Coherence: Low-Latency Coherence on Unordered Interconnects , 2003 .

[37]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[38]  Samuel Sundberg,et al.  Semi-Toeplitz preconditioning for linearized boundary layer problems , 2002 .

[39]  Sverker Holmgren,et al.  Implementation Issues for High Performance CFD , 2004 .

[40]  Lisa Noordergraaf,et al.  Performance experiences on Sun's Wildfire prototype , 1999, SC '99.

[41]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[42]  Anoop Gupta,et al.  Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[43]  Erik Hagersten,et al.  Simple COMA node implementations , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[44]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[45]  Erik Hagersten,et al.  Gigaplane: A High Performance Bus for Large SMPs , 2003 .

[46]  Todd C. Mowry,et al.  Tolerating latency in multiprocessors through compiler-inserted prefetching , 1998, TOCS.

[47]  Steven Przybylski The performance impact of block sizes and fetch strategies , 1990, ISCA '90.

[48]  Dean M. Tullsen,et al.  Limitations of cache prefetching on a bus-based multiprocessor , 1993, ISCA '93.

[49]  Kalyani Munasinghe On using mobile agents for load balancing in high performance computing , 2002 .

[50]  Malin Ljungberg Handling of curvilinear coordinates in a PDE solver framework , 2003 .

[51]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[52]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS III.

[53]  Henrik Lundgren Implementation and real-world evaluation of routing protocols for wireless ad hoc networks , 2002 .

[54]  Jenny Persson Basic values in software development and organizational change , 2003 .

[55]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[56]  Bengt Fornberg,et al.  A practical guide to pseudospectral methods: Introduction , 1996 .