DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures
暂无分享,去创建一个
Andreas Herkersdorf | Thomas Wild | Florian Schmaus | Wolfgang Schröder-Preikschat | Sven Rheindt | Sebastian Maier | Nora Pohle | Lars Nolte | Oliver Lenke
[1] Eric A. Brewer,et al. Remote queues: exposing message queues for optimization and atomicity , 1995, SPAA '95.
[2] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[3] Sanghoon Lee,et al. HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[4] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.
[5] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[6] Jaspal Subhlok,et al. Characterizing NAS benchmark performance on shared heterogeneous networks , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[7] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[8] Ron Sass,et al. Exploring hardware work queue support for lightweight threads in MPSoCs , 2012, 2012 International Conference on Reconfigurable Computing and FPGAs.
[9] B. Grundmann,et al. From Single Core to Multi-Core: Preparing for a new exponential , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.
[10] Ren Wang,et al. CAF: Core to core Communication Acceleration Framework , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[11] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[12] Sparsh Mittal. A survey on evaluating and optimizing performance of Intel Xeon Phi , 2020, Concurr. Comput. Pract. Exp..
[13] PattersonDavid,et al. A Case for Intelligent RAM , 1997 .
[14] Anant Agarwal,et al. Integrating message-passing and shared-memory: early experience , 1993, PPOPP '93.
[15] Mark Moir,et al. Concurrent Data Structures , 2004, Handbook of Data Structures and Applications.
[16] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[17] Babak Falsafi,et al. Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[18] Jongman Kim,et al. IsoNet: Hardware-Based Job Queue Management for Many-Core Architectures , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[19] Timo Hönig,et al. Asynchronous Abstract Machines: Anti-noise System Software for Many-core Processors , 2019 .
[20] Filip Moerman. Open event machine: A multi-core run-time designed for performance , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).
[21] Jürgen Teich,et al. Efficient task spawning for shared memory and message passing in many-core architectures , 2017, J. Syst. Archit..
[22] Wolfgang Schröder-Preikschat,et al. SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures , 2019, SAMOS.
[23] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[24] Jürgen Teich,et al. The Invasive Network on Chip - A Multi-Objective Many-Core Communication Infrastructure , 2014, ARCS Workshops.
[25] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[26] André Schiper,et al. Leveraging Hardware Message Passing for Efficient Thread Synchronization , 2016, ACM Trans. Parallel Comput..
[27] Andreas Herkersdorf,et al. TCU: A Multi-Objective Hardware Thread Mapping Unit for HPC Clusters , 2016, ISC.
[28] Jürgen Teich,et al. Invasive Computing: An Overview , 2011, Multiprocessor System-on-Chip.
[29] Lars Bauer,et al. System Software for Resource Arbitration on Future Many-* Architectures , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[30] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[31] Hsiao-Keng Jerry Chu,et al. Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.
[32] Jean-Philippe Diguet,et al. Subutai: Distributed Synchronization Primitives in NoC Interfaces for Legacy Parallel-Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[33] Christoforos E. Kozyrakis,et al. Flexible architectural support for fine-grain scheduling , 2010, ASPLOS XV.
[34] Rainer Buchty,et al. Data-Centric Computing Frontiers: A Survey On Processing-In-Memory , 2016, MEMSYS.
[35] Andreas Schenk,et al. CaCAO: Complex and Compositional Atomic Operations for NoC-Based Manycore Platforms , 2018, ARCS.