High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation
暂无分享,去创建一个
[1] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[2] Arch D. Robison,et al. Structured Parallel Programming: Patterns for Efficient Computation , 2012 .
[3] Steven A. Hofmeyr,et al. Oversubscription on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[4] Todd C. Mowry,et al. Automatic Compiler-Inserted Prefetching for Pointer-Based Applications , 1999, IEEE Trans. Computers.
[5] M. Hosomi,et al. A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..
[6] B. J. Muga,et al. Particle-in-Cell Method , 1970 .
[7] Onur Mutlu,et al. Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Margaret H. Wright,et al. The opportunities and challenges of exascale computing , 2010 .
[9] Stephen L. Olivier,et al. Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs , 2010, International Journal of Parallel Programming.
[10] Alex Ramírez,et al. The low-power architecture approach towards exascale computing , 2011, ScalA '11.
[11] Samuel Williams,et al. Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[12] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[13] Duncan A. Grove,et al. Communication Benchmarking and Performance Modelling of MPI Programs on Cluster Computers , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[14] Alice Koniges,et al. Application Acceleration on Current and Future Cray Platforms , 2010 .
[15] Fang Liu,et al. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors , 2011, SIGMETRICS '11.
[16] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[17] Martin Schulz,et al. Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[18] Douglas Thain,et al. Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[19] Xingfu Wu,et al. Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.
[20] Vijayalakshmi Srinivasan,et al. When prefetching improves/degrades performance , 2005, CF '05.
[21] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[22] Thomas L. Sterling,et al. ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.
[23] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[24] Guido Torelli,et al. A Bipolar-Selected Phase Change Memory Featuring Multi-Level Cell Storage , 2009, IEEE Journal of Solid-State Circuits.
[25] Mark M. Mathis,et al. A performance model of non-deterministic particle transport on large-scale systems , 2003, Future Gener. Comput. Syst..
[26] Robert B. Ross,et al. Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[27] Cyriel Minkenberg,et al. Trace-driven co-simulation of high-performance computing systems using OMNeT++ , 2009, SimuTools.
[28] Collin McCurdy,et al. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[29] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[30] Collin McCurdy,et al. Diagnosis and optimization of application prefetching performance , 2013, ICS '13.
[31] Wei-Chung Hsu,et al. Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[32] Onur Mutlu,et al. Prefetch-aware shared-resource management for multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[33] Robert H. Halstead,et al. MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.
[34] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[35] Yiran Chen,et al. Compact modeling and corner analysis of spintronic memristor , 2009, 2009 IEEE/ACM International Symposium on Nanoscale Architectures.