Towards a Deterministic Fine-Grained Task Ordering Using Multi-Versioned Memory
暂无分享,去创建一个
Mark Oskin | Yoav Etsion | Eran Gilad | Tehila Mayzels | Elazar Raab | M. Oskin | Yoav Etsion | Eran Gilad | Tehila Mayzels | Elazar Raab
[1] Guido Araujo,et al. The Batched DOACROSS loop parallelization algorithm , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).
[2] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[3] Eduard Ayguadé,et al. Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Cong Yan,et al. A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Rudolf Bayer,et al. Concurrency of operations on B-trees , 1994, Acta Informatica.
[6] Gu-Yeon Wei,et al. HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.
[7] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[8] Kunle Olukotun,et al. Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor , 1997 .
[9] Jesús Labarta,et al. Handling task dependencies under strided and aliased references , 2010, ICS '10.
[10] Josep Torrellas,et al. Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[11] Gurindar S. Sohi,et al. Speculative Versioning Cache , 2001, IEEE Trans. Parallel Distributed Syst..
[12] John G. Cleary,et al. Timestamp representations for virtual sequences , 1997 .
[13] Tarek S. Abdelrahman,et al. Architectural support for synchronization-free deterministic parallel programming , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[14] Kunle Olukotun,et al. Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.
[15] Albert Cohen,et al. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs , 2012, TACO.
[16] Wei Liu,et al. Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.
[17] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[18] Dimitrios S. Nikolopoulos,et al. A Unified Scheduler for Recursive and Task Dataflow Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[19] Eduard Ayguadé,et al. Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..
[20] R. Karp,et al. Properties of a model for parallel computations: determinacy , 1966 .
[21] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[22] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[23] Eduard Ayguadé,et al. Task-Based Programming with OmpSs and Its Application , 2014, Euro-Par Workshops.
[24] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[25] Gurindar S. Sohi,et al. Dataflow execution of sequential imperative programs on multicore architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[27] Mark Oskin,et al. O-structures: semantics for versioned memory , 2014, MSPC@PLDI.
[28] Josep Torrellas,et al. A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.
[29] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[30] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[31] Keshav Pingali,et al. I-structures: Data structures for parallel computing , 1986, Graph Reduction.
[32] Richard W. Vuduc,et al. Branch-Avoiding Graph Algorithms , 2014, SPAA.
[33] Eduard Ayguadé,et al. Integrating Dataflow Abstractions into the Shared Memory Model , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[34] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[35] Arvind,et al. M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.
[36] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[37] Ron Cytron,et al. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.
[38] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[39] Andreas Moshovos,et al. Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.
[40] Jacob Nelson,et al. Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX ATC.
[41] John Paul Shen,et al. Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices , 2008, IEEE Transactions on Parallel and Distributed Systems.
[42] Antonia Zhai,et al. The STAMPede approach to thread-level speculation , 2005, TOCS.