Data Locality Exploitation in the Decomposition of Regular Domain Problems
暂无分享,去创建一个
[1] Prithviraj Banerjee,et al. Techniques to overlap computation and communication in irregular iterative applications , 1994, ICS '94.
[2] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[3] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[4] Ian T. Foster,et al. Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .
[5] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[6] Francisco Tirado,et al. Parallel resolution of alternating-line processes by means of pipelining techniques , 1999, Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99.
[7] Zhiwei Xu,et al. Modeling communication overhead: MPI and MPL performance on the IBM SP2 , 1996, IEEE Parallel Distributed Technol. Syst. Appl..
[8] Mark D. Hill,et al. Making Network Interfaces Less Peripheral , 1998, Computer.
[9] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[10] Francisco Tirado,et al. Partitioning Regular Domains on Modern Parallel Computers , 1998, VECPAR.
[11] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[12] Michael M. Resch,et al. Performance of MPI on the CRAY T3E-512 , 1997 .
[13] Jack J. Dongarra,et al. Software Libraries for Linear Algebra Computations on High Performance Computers , 1995, SIAM Rev..
[14] Francisco Tirado,et al. Solution of alternating-line processes on modern parallel computers , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[15] Anthony J. G. Hey,et al. Selected Results from the ParkBench Benchmark , 1996, Euro-Par, Vol. II.
[16] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[17] Aad J. van der Steen,et al. A Performance Analysis of the SGI Origin2000 , 1998, VECPAR.
[18] Ulrich Rüde,et al. Iterative Algorithms on High Performance Architectures , 1997, Euro-Par.
[19] Francisco Tirado,et al. Distributed parallel computers versus PVM on a workstation cluster in the simulation of time dependent partial differential equations , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.
[20] Anthony J. G. Hey,et al. Message-Passing Performance of Parallel Computers , 1997, Euro-Par.
[21] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[22] Francisco Tirado,et al. Message Passing Evaluation and Analysis on Cray T3E and SGI Origin 2000 Systems , 1999, Euro-Par.
[23] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[24] Agustin Arruabarrena,et al. Parallel architectures: Assessing the performance of the new IBM SP2 communication subsystem , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.
[25] Ken Kennedy,et al. GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.
[26] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[27] Nenad Nedeljkovic,et al. Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.
[28] Sudhakar Yalamanchili,et al. Interconnection Networks: An Engineering Approach , 2002 .
[29] Francisco Tirado,et al. Impact of PE Mapping on Cray T3E Message-Passing Performance , 2000, Euro-Par.
[30] Francisco Tirado,et al. Relationships Between Efficiency and Execution Time of Full Multigrid Methods on Parallel Computers , 1997, IEEE Trans. Parallel Distributed Syst..
[31] E. Anderson,et al. Performance of the CRAY T3E Multiprocessor , 1997, ACM/IEEE SC 1997 Conference (SC'97).