GPU Acceleration for Simulating Massively Parallel Many-Core Platforms
暂无分享,去创建一个
[1] Luca Benini,et al. Supporting OpenMP on a multi-cluster embedded MPSoC , 2011, Microprocess. Microsystems.
[2] Report,et al. Public International Benchmarks for Parallel Computers , 1993 .
[3] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[4] David A. Bader,et al. Guest Editor's Introduction: Special Issue on High-Performance Computing with Accelerators , 2011, IEEE Trans. Parallel Distributed Syst..
[5] Laxmikant V. Kalé,et al. BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[6] Coniferous softwood. GENERAL TERMS , 2003 .
[7] Valeria Bertacco,et al. Event-driven gate-level simulation with GP-GPUs , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[8] Leonid Ryzhyk,et al. The ARM Architecture , 2006 .
[9] Michael Frumkin,et al. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .
[10] James E. Smith,et al. Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[11] Sreedhar B. Kodali,et al. The Asynchronous Partitioned Global Address Space Model , 2010 .
[12] Michael Adler,et al. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[13] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[14] Kai Lu,et al. The TianHe-1A Supercomputer: Its Hardware and Software , 2011, Journal of Computer Science and Technology.
[15] Luca Benini,et al. Analysis of Evolving SoC Interconnect Protocols , 2004 .
[16] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[17] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[18] Arturo González-Escribano,et al. The OpenMP source code repository , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.
[19] Vladimir Getov,et al. PARKBENCH Report -1: Public International Benchmarks for Parallel Computers, Technical Report: UT-CS-93-213 , 1994 .
[20] Mateo Valero,et al. From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype , 2011, ARC.
[21] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Wang Zhiqiang,et al. Using GPU to Accelerate Cache Simulation , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.
[23] Babak Falsafi,et al. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs , 2009, TRETS.
[24] Christoforos E. Kozyrakis,et al. RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.
[25] Luca Benini,et al. Scalable instruction set simulator for thousand-core architectures running on GPGPUs , 2010, 2010 International Conference on High Performance Computing & Simulation.
[26] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[27] Franco Fummi,et al. A timing-accurate HW/SW cosimulation of an ISS with SystemC , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..
[28] Luciano Lavagno,et al. Software performance estimation strategies in a system-level design tool , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).
[29] Sunil P. Khatri,et al. Towards acceleration of fault simulation using Graphics Processing Units , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[30] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[31] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[32] David Defour,et al. Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[33] Luca Benini,et al. Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting , 2012, GPGPU-5.
[34] David I. August,et al. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[35] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[36] Luca Benini,et al. GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[37] Matt T. Yourst. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[38] J. M. Bull,et al. Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .
[39] Luca Benini,et al. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.
[40] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[41] Paolo Faraboschi,et al. COTSon: infrastructure for full system simulation , 2009, OPSR.
[42] Shekhar Y. Borkar,et al. Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[43] П. Довгалюк,et al. Два способа организации механизма полносистемного детерминированного воспроизведения в симуляторе QEMU , 2012 .