Performance analysis of multi‐level parallelism: inter‐node, intra‐node and hardware accelerators
暂无分享,去创建一个
[1] H Burau,et al. PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster , 2010, IEEE Transactions on Plasma Science.
[2] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[3] Matthias S. Müller,et al. The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.
[4] Toni Cortes,et al. PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .
[5] Wolfgang E. Nagel,et al. Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach , 2001, International Conference on Computational Science.
[6] Wolfgang E. Nagel,et al. Event Tracing and Visualization for Cell Broadband Engine Systems , 2008, Euro-Par.
[7] Jack J. Dongarra,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.
[8] Michael Lang,et al. Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Allen D. Malony,et al. An experimental approach to performance measurement of heterogeneous parallel applications using CUDA , 2010, ICS '10.
[10] Matthias S. Müller,et al. Developing Scalable Applications with Vampir, VampirServer and VampirTrace , 2007, PARCO.
[11] Guido Juckeland,et al. Comprehensive Performance Tracking with Vampir 7 , 2009, Parallel Tools Workshop.
[12] Teofilo F. Gonzalez,et al. Performance data collection using a hybrid approach , 2005, ESEC/FSE-13.
[13] Wolfgang E. Nagel,et al. Introducing the Open Trace Format (OTF) , 2006, International Conference on Computational Science.
[14] Guido Juckeland,et al. Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures , 2010, 2010 39th International Conference on Parallel Processing Workshops.