Toward a Microarchitecture for Efficient Execution of Irregular Applications
暂无分享,去创建一个
[1] Keith D. Cooper,et al. Improvements to graph coloring register allocation , 1994, TOPL.
[2] Keith D. Cooper,et al. Stochastic instruction scheduling , 2000 .
[3] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[4] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.
[5] Krste Asanovic,et al. The RISC-V Instruction Set Manual Volume 2: Privileged Architecture Version 1.7 , 2015 .
[6] Roberto Castañeda Lozano,et al. Constraint-Based Register Allocation and Instruction Scheduling , 2012, CP.
[7] Todd A. Proebsting. Code Generation Techniques , 1992 .
[8] Yong Chen,et al. Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture , 2016, MEMSYS.
[9] Guang R. Gao,et al. Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor , 2009, Euro-Par.
[10] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[11] Han Li,et al. A study towards optimal data layout for GPU computing , 2012, MSPC '12.
[12] David Gordon Bradlee,et al. Retargetable instruction scheduling for pipelined processors , 1991 .
[13] Anand Sivasubramaniam,et al. Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.
[14] Douglas Thain,et al. Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[15] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] David Patterson,et al. An Agile Approach to Building RISC-V Microprocessors , 2016, IEEE Micro.
[17] John D. Leidel. GoblinCore-64: A scalable, open architecture for data intensive high performance computing , 2017 .
[18] R. Weisberg. A-N-D , 2011 .
[19] José E. Moreira,et al. Dissecting Cyclops: a detailed analysis of a multithreaded architecture , 2003, CARN.
[20] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[21] Scott A. Mahlke,et al. WarpPool: Sharing requests with inter-warp coalescing for throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[23] J. Dongarra,et al. HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ , 2015 .
[24] Mateo Valero,et al. Dynamic transaction coalescing , 2014, Conf. Computing Frontiers.
[25] John D. Leidel,et al. Memory Coalescing for Hybrid Memory Cube , 2018, ICPP.
[26] Thomas F. Wenisch,et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[27] Maya Gokhale,et al. Hybrid memory cube performance characterization on data-centric workloads , 2015, IA3@SC.
[28] Isom L. Crawford,et al. Software Optimization for High Performance Computers , 2000 .
[29] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[30] Mateo Valero,et al. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.
[31] Guang R. Gao,et al. Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).
[32] Paul Rosenfeld,et al. Performance Exploration of the Hybrid Memory Cube , 2014 .
[33] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[34] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[35] Yong Chen,et al. Pressure-Driven Hardware Managed Thread Concurrency for Irregular Applications , 2017, IA3@SC.
[36] R. Hornung,et al. HYDRODYNAMICS CHALLENGE PROBLEM , 2011 .
[37] Simon Kahan,et al. Tera Hardware Software Cooperation , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[38] Yong Chen,et al. GoblinCore-64: A RISC-V Based Architecture for Data Intensive Computing , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).
[39] Susan J. Eggers,et al. The Marion system for retargetable instruction scheduling , 1991, PLDI '91.
[40] Bahar Asgari,et al. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube , 2017, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[41] Andrew Waterman,et al. The RISC-V Instruction Set Manual. Volume 1: User-Level ISA, Version 2.0 , 2014 .
[42] Yunsup Lee,et al. The RISC-V Instruction Set Manual , 2014 .
[43] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[44] David Mizell,et al. Early experiences with large-scale Cray XMT systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[45] Keith D. Cooper,et al. Tailoring graph-coloring register allocation for runtime compilation , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[46] John D. Leidel,et al. CHOMP: A Framework and Instruction Set for Latency Tolerant, Massively Multithreaded Processors , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[47] Geoffrey Ingram Taylor,et al. The formation of a blast wave by a very intense explosion. - II. The atomic explosion of 1945 , 1950, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[48] Vikram S. Adve,et al. The LLVM Instruction Set and Compilation Strategy , 2002 .
[49] Antonino Tumeo,et al. MAC: Memory Access Coalescer for 3D-Stacked Memory , 2019, ICPP.
[50] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.
[51] Sandia Report,et al. Toward a New Metric for Ranking High Performance Computing Systems , 2013 .
[52] David H. Bailey,et al. NAS parallel benchmark results , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.