Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming
暂无分享,去创建一个
[1] Edward D. Lazowska,et al. Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.
[2] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[3] LamportLeslie. Time, clocks, and the ordering of events in a distributed system , 1978 .
[4] Alejandro Duran,et al. An adaptive cut-off for task parallelism , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[6] Gang Qu,et al. Mesh-of-Trees and Alternative Interconnection Networks for Single-Chip Parallelism , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[7] Edsger W. Dijkstra,et al. Go To Statement Considered Harmful , 2022, Software Pioneers.
[8] Uzi Vishkin,et al. A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors , 2011, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.
[9] Gang Qu,et al. Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).
[10] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[11] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[12] Sanguthevar Rajasekaran,et al. Handbook of Parallel Computing - Models, Algorithms and Applications , 2007 .
[13] Fuat Keceli,et al. Power-Performance Comparison of Single-Task Driven Many-Cores , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.
[14] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[15] Guy E. Blelloch,et al. Compiling Collection-Oriented Languages onto Massively Parallel Computers , 1990, J. Parallel Distributed Comput..
[16] Uzi Vishkin,et al. PRAM-on-chip: first commitment to silicon , 2007, SPAA '07.
[17] Richard Cole,et al. Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms , 1986, STOC '86.
[18] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[19] Christoph W. Kessler,et al. Practical PRAM programming , 2000, Wiley series on parallel and distributed computing.
[20] Herb Sutter,et al. The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software , 2013 .
[21] Hans-Juergen Boehm,et al. HP Laboratories , 2006 .
[22] Robert H. Halstead,et al. Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.
[23] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[24] Fuat Keceli,et al. Resource-Aware Compiler Prefetching for Many-Cores , 2010, 2010 Ninth International Symposium on Parallel and Distributed Computing.
[25] Uzi Vishkin,et al. Using simple abstraction to reinvent computing for parallelism , 2011, Commun. ACM.
[26] Uzi Vishkin,et al. A pilot study to compare programming effort for two parallel programming models , 2007, J. Syst. Softw..
[27] Seth Copen Goldstein,et al. Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..
[28] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[29] Mark Moir,et al. A dynamic-sized nonblocking work stealing deque , 2006, Distributed Computing.
[30] F. Warren Burton,et al. Executing functional programs on a virtual tree of processors , 1981, FPCA '81.
[31] Maged M. Michael,et al. Idempotent work stealing , 2009, PPoPP '09.
[32] Doug Lea,et al. A Java fork/join framework , 2000, JAVA '00.
[33] Eduard Ayguadé,et al. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors , 1999, ICS '99.
[34] Lars Bergstrom,et al. Lazy tree splitting , 2012, J. Funct. Program..
[35] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.
[36] Alexandros Tzannes,et al. Lazy binary-splitting: a run-time adaptive work-stealing scheduler , 2010, PPoPP '10.
[37] Uzi Vishkin,et al. XMT-GPU: A PRAM Architecture for Graphics Computation , 2008, 2008 37th International Conference on Parallel Processing.
[38] George C. Caragea,et al. General-Purpose vs . GPU : Comparison of Many-Cores on Irregular Workloads , 2010 .
[39] Fuat Keceli,et al. Power and Performance studies of the Explicit Multi-Threading (XMT) Architecture , 2011 .
[40] Eleftherios D. Polychronopoulos,et al. Efficient Runtime Thread Management for the Nano-Threads Programming Model , 1998, IPPS/SPDP Workshops.
[41] Guy E. Blelloch,et al. Implementation of a portable nested data-parallel language , 1993, PPOPP '93.
[42] A. B. Saybasili. HIGHLY PARALLEL MULTI-DIMENSIONAL FAST FOURIER TRANSFORM ON FINE-AND COARSE-GRAINED MANY-CORE APPROACHES , 2022 .
[43] Zheng Li,et al. Scalable hardware support for conditional parallelization , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[44] Aydin O. Balkan. Mesh-of-Trees Interconnection Network for an Explicitly Multi-Threaded Parallel Computer Architecture , 2008 .
[45] Robert H. Halstead,et al. Mul-T: a high-performance parallel Lisp , 1989, PLDI '89.
[46] Uzi Vishkin,et al. Is teaching parallel algorithmic thinking to high school students possible?: one teacher's experience , 2010, SIGCSE.
[47] Sebastian Burckhardt,et al. The design of a task parallel library , 2009, OOPSLA.
[48] Arthur Charguéraud,et al. Oracle scheduling: controlling granularity in implicitly parallel languages , 2011, OOPSLA '11.
[49] Uzi Vishkin,et al. Brief announcement: speedups for parallel graph triconnectivity , 2012, SPAA '12.
[50] Matteo Frigo,et al. Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.
[51] Gang Qu,et al. A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).
[52] Uzi Vishkin,et al. Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach , 2003, Theory of Computing Systems.
[53] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[54] George C. Caragea,et al. Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform , 2006, Handbook of Parallel Computing.
[55] George C. Caragea,et al. Brief announcement: better speedups for parallel max-flow , 2011, SPAA '11.
[56] George C. Caragea. Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming , 2011 .
[57] Laurie J. Hendren,et al. SableCC, an object-oriented compiler framework , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).
[58] I-Ting Angelina Lee,et al. Location-based memory fences , 2011, SPAA '11.
[59] Nir Shavit,et al. Non-blocking steal-half work queues , 2002, PODC '02.
[60] Fuat Keceli,et al. Thermal Management of a Many-Core Processor under Fine-Grained Parallelism , 2011, Euro-Par Workshops.
[61] Michael Voss,et al. Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[62] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[63] Jesper Larsson Träff,et al. User-Land Work Stealing Schedulers: Towards a Standard , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.
[64] Gang Qu,et al. An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[65] Fuat Keceli,et al. Toolchain for Programming, Simulating and Studying the XMT Many-Core Architecture , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[66] Uzi Vishkin,et al. Better speedups using simpler parallel programming for graph connectivity and biconnectivity , 2012, PMAM '12.
[67] Benjamin Hindman,et al. Composing parallel software efficiently with lithe , 2010, PLDI '10.
[68] Uzi Vishkin,et al. Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract) , 1998, SPAA '98.
[69] Olivier Temam,et al. A Practical Approach for Reconciling High and Predictable Performance in Non-Regular Parallel Programs , 2008, 2008 Design, Automation and Test in Europe.
[70] George C. Necula,et al. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.
[71] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[72] Akinori Yonezawa,et al. StackThreads/MP: integrating futures into calling standards , 1999, PPoPP '99.
[73] Uzi Vishkin,et al. An O(n² log n) Parallel MAX-FLOW Algorithm , 1982, J. Algorithms.
[74] James Reinders,et al. Intel® threading building blocks , 2008 .
[75] Seth Copen Goldstein,et al. Lazy threads: compiler and runtime structures for fine-grained parallel programming , 1998 .
[76] George C. Caragea,et al. Brief announcement: performance potential of an easy-to-program PRAM-on-chip prototype versus state-of-the-art processor , 2009, SPAA '09.
[77] Jeremy Manson,et al. The Java memory model , 2005, POPL '05.
[78] Ralph Grishman,et al. The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract) , 1982, ISCA 1982.
[79] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[80] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[81] Uzi Vishkin,et al. Fpga-based prototype of a pram-on-chip processor , 2008, CF '08.
[82] Alejandro Duran,et al. Automatic thread distribution for nested parallelism in OpenMP , 2005, ICS '05.
[83] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[84] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[85] Xingzhi Wen. Hardware Design, Prototyping and Studies of the Explicit Multi-Threading (XMT) Paradigm , 2008 .
[86] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[87] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.
[88] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.
[89] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[90] Guy E. Blelloch,et al. The Data Locality of Work Stealing , 2002, SPAA '00.