Data-Driven Concurrency for High Performance Computing
暂无分享,去创建一个
[1] Ian Watson,et al. The Manchester prototype dataflow computer , 1985, CACM.
[2] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[3] Vítor Santos Costa,et al. Trebuchet: exploring TLP with dataflow virtualisation , 2011, Int. J. High Perform. Syst. Archit..
[4] J. Demmel,et al. Sun Microsystems , 1996 .
[5] Philippe Olivier Alexandre Navaux,et al. Challenges and Issues of Supporting Task Parallelism in MPI , 2010, EuroMPI.
[6] Ali R. Hurson,et al. Dataflow architectures and multithreading , 1994, Computer.
[7] Paraskevas Evripidou,et al. Data-flow Concurrency on Distributed Multi-core Systems , 2013 .
[8] Marco Danelutto,et al. FastFlow: High-level and Efficient Streaming on Multi-core , 2017 .
[9] Wei Ge,et al. The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.
[10] Josep Torrellas,et al. Data forwarding in scalable shared-memory multiprocessors , 1995, ICS '95.
[11] Samuel H. Fuller,et al. Computing Performance: Game Over or Next Level? , 2011, Computer.
[12] Kathleen Knobe,et al. Concurrent Collections on Distributed Memory Theory Put into Practice , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[13] Samer Arandi,et al. The data-driven multithreading virtual machine , 2012 .
[14] Eduard Ayguadé,et al. Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..
[15] Hartmut Kaiser,et al. HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.
[16] Paraskevas Evripidou,et al. Architectural Support for Data-Driven Execution , 2015, ACM Trans. Archit. Code Optim..
[17] Paraskevas Evripidou,et al. DDM-VMc: the data-driven multithreading virtual machine for the cell processor , 2011, HiPEAC.
[18] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[19] Margaret H. Wright,et al. The opportunities and challenges of exascale computing , 2010 .
[20] Pen-Chung Yew,et al. Data Prefetching and Data Forwarding in Shared Memory Multiprocessors , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[21] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[22] Paraskevas Evripidou,et al. Data-Driven Multithreading Using Conventional Microprocessors , 2006, IEEE Transactions on Parallel and Distributed Systems.
[23] Paraskevas Evripidou,et al. Verilog-based simulation of hardware support for data-flow concurrency on multicore systems , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).
[24] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[25] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.
[26] Thomas Hérault,et al. Scalable Dense Linear Algebra on Heterogeneous Hardware , 2012, High Performance Computing Workshop.
[27] Tracy Camp,et al. A taxonomy of distributed termination detection algorithms , 1998, J. Syst. Softw..
[28] Anthony Skjellum,et al. Using MPI: Portable Programming with the Message-Passing Interface , 1999 .
[29] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[30] Guang R. Gao,et al. Position Paper: Using a "Codelet" Program Execution Model for Exascale Machines , 2011 .
[31] Benoît Meister,et al. The Open Community Runtime: A runtime system for extreme scale computing , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[32] Kathleen Knobe,et al. Ease of use with concurrent collections (CnC) , 2009 .
[33] George Bosilca,et al. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution , 2015, 2015 IEEE International Conference on Cluster Computing.
[34] J. Dongarra,et al. Lightweight Superscalar Task Execution in Distributed Memory , 2014 .
[35] Felipe Maia Galvão França,et al. Task Scheduling in Sucuri Dataflow Library , 2016, 2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW).
[36] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[37] Josep Torrellas,et al. Data Forwarding in Scalable Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..
[38] Paraskevas Evripidou,et al. DDMCPP : The Data-Driven Multithreading C PreProcessor , 2007 .
[39] Paraskevas Evripidou,et al. Paradigm Shift for EXASCALE Computing , 2015 .
[40] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[41] Lapack Working. Scheduling Linear Algebra Operations on Multicore Processors – , 2009 .
[42] Guang R. Gao,et al. Application characterization at scale: lessons learned from developing a distributed open community runtime system for high performance computing , 2016, Conf. Computing Frontiers.
[43] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[44] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[45] Oliver Pell,et al. Maximum Performance Computing with Dataflow Engines , 2012, Computing in Science & Engineering.
[46] Edsger W. Dijkstra,et al. Termination Detection for Diffusing Computations , 1980, Inf. Process. Lett..
[47] Arvind,et al. Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.
[48] Gurindar S. Sohi,et al. Dataflow execution of sequential imperative programs on multicore architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[49] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[50] Alejandro Duran,et al. Productive Cluster Programming with OmpSs , 2011, Euro-Par.
[51] Pedro C. Diniz. Exascale Programming Challenges , 2011 .
[52] Eduard Ayguadé,et al. Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.
[53] Roberto Giorgi,et al. DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).
[54] P. Evripidou,et al. FREDDO: an efficient Framework for Runtime Execution of Data-Driven Objects , 2017 .
[55] Vítor Santos Costa,et al. Couillard: Parallel programming via coarse-grained Data-flow Compilation , 2011, Parallel Comput..
[56] Thomas L. Sterling,et al. ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.
[57] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.
[58] Paraskevas Evripidou,et al. Programming multi-core architectures using Data-Flow techniques , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.
[59] Paraskevas Evripidou. Thread Synchronization Unit (TSU): A Building Block for High Performance Computers , 1997, ISHPC.
[60] Krishna M. Kavi,et al. Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.
[61] Paul Krzyzanowski. Distributed shared memory , 1998 .
[62] Nam Ho,et al. Dataflow Support in x86_64 Multicore Architectures through Small Hardware Extensions , 2015, 2015 Euromicro Conference on Digital System Design.
[63] James Demmel,et al. Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..
[64] Arvind,et al. The U-Interpreter , 1982, Computer.
[65] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[66] Peter Kilpatrick,et al. Targeting Distributed Systems in FastFlow , 2012, Euro-Par Workshops.
[67] Veljko M. Milutinovic,et al. Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..
[68] Jack J. Dongarra,et al. Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores , 2014, ICS '14.
[69] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).