Flexible architectural support for fine-grain scheduling
暂无分享,去创建一个
[1] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[2] Rajiv Gupta,et al. ECMon: exposing cache events for monitoring , 2009, ISCA '09.
[3] Ronald G. Dreslinski,et al. The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.
[4] David A. Wood,et al. Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[5] Hong Jiang,et al. Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[6] Sriram Krishnamoorthy,et al. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing , 2008, 2008 37th International Conference on Parallel Processing.
[7] Keshav Pingali,et al. Scheduling strategies for optimistic parallel execution of irregular programs , 2008, SPAA '08.
[8] Magnus Själander,et al. A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.
[9] Larry Rudolph,et al. Message passing support on StarT-Voyager , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).
[10] Guy E. Blelloch,et al. Provably efficient scheduling for languages with fine-grained parallelism , 1999, JACM.
[11] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[12] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[13] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[14] Quinn Jacobson,et al. Disintermediated Active Communication , 2006, IEEE Computer Architecture Letters.
[15] David A. Bader,et al. A Cache-Aware Parallel Implementation of the Push-Relabel Network Flow Algorithm and Experimental Evaluation of the Gap Relabeling Heuristic , 2006, PDCS.
[16] Zhen Fang,et al. Quantifying the performance contribution of various aspects of AMOs , 2022 .
[17] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[18] William J. Dally,et al. Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.
[19] Andrei Sergeevich Terechko,et al. A Hardware Task Scheduler for Embedded Video Processing , 2008, HiPEAC.
[20] Thomas Rauber,et al. Performance Evaluation of Task Pools Based on Hardware Synchronization , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[21] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[22] Alejandro Duran,et al. Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.
[23] Michael F. Spear,et al. Alert-on-update: a communication aid for shared memory multiprocessors , 2007, PPOPP.
[24] Pat Hanrahan,et al. GRAMPS: A programming model for graphics pipelines , 2009, ACM Trans. Graph..
[25] David A. Bader,et al. GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.
[26] Pradeep Dubey,et al. Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.
[27] William J. Dally,et al. An Efficient, Protected Message Interface , 1998, Computer.
[28] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[29] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[30] Greg Grohoski. Niagara-2: A highly threaded server-on-a-chip , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[31] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.
[32] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[33] William J. Dally,et al. The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[34] Vivek Sarkar,et al. Work-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs , 2008 .
[35] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[36] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.
[37] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Ricardo Bianchini,et al. The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[39] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[40] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008 .
[41] Vivek Sarkar,et al. Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.
[42] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[43] Guy E. Blelloch,et al. Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.
[44] Victor Lee,et al. Exploiting two-case delivery for fast protected messaging , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.