Data-centric execution of speculative parallel programs
暂无分享,去创建一个
Daniel Sánchez | Joel S. Emer | Suvinay Subramanian | Maleen Abeydeera | Mark C. Jeffrey | J. Emer | Daniel Sánchez | M. C. Jeffrey | Maleen Abeydeera | Suvinay Subramanian
[1] Jacob Nelson,et al. Latency-Tolerant Distributed Shared Memory For Data-Intensive Applications , 2015 .
[2] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[3] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[4] Osman S. Unsal,et al. HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory , 2013, 20th Annual International Conference on High Performance Computing.
[5] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[6] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[7] Ronald G. Dreslinski,et al. Proactive transaction scheduling for contention management , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Vivek Sarkar,et al. Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.
[9] Christoforos E. Kozyrakis,et al. Locality-aware task management for unstructured parallelism: a quantitative limit study , 2013, SPAA.
[10] Ye Sun,et al. Distributed Transactional Memory for Metric-Space Networks , 2005, DISC.
[11] Keshav Pingali,et al. Priority Queues Are Not Good Concurrent Priority Schedulers , 2015, Euro-Par.
[12] Keshav Pingali,et al. Synthesizing concurrent schedulers for irregular algorithms , 2011, ASPLOS XVI.
[13] Josep Torrellas,et al. Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[14] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[15] Alejandro Duran,et al. Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.
[16] Hagit Attiya,et al. R EL STM : A Proactive Transactional Memory Scheduler ∗ , 2013 .
[17] Christoforos E. Kozyrakis,et al. Dynamic Fine-Grain Scheduling of Pipeline Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[18] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[19] Ronald G. Dreslinski,et al. Bloom Filter Guided Transaction Scheduling , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[20] Krste Asanovic,et al. Controlling program execution through binary instrumentation , 2005, CARN.
[21] Eddie Kohler,et al. Speedy transactions in multicore in-memory databases , 2013, SOSP.
[22] Cong Yan,et al. A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Danny Hendler,et al. CAR-STM: scheduling-based collision avoidance and resolution for software transactional memory , 2008, PODC '08.
[24] Nuno Diegues,et al. Seer: Probabilistic Scheduling for Hardware Transactional Memory , 2015, ACM Trans. Comput. Syst..
[25] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[26] Ye Sun,et al. Distributed transactional memory for metric-space networks , 2005, Distributed Computing.
[27] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[28] Kunle Olukotun,et al. A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[29] Guy E. Blelloch,et al. The Data Locality of Work Stealing , 2002, SPAA '00.
[30] Bradford L. Chamberlain,et al. Software transactional memory for large scale clusters , 2008, PPoPP.
[31] J. P. Grossman,et al. Hardware support for fine-grained event-driven computation in Anton 2 , 2013, ASPLOS '13.
[32] Keshav Pingali,et al. Optimistic parallelism benefits from data partitioning , 2008, ASPLOS.
[33] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[34] Emmett Kilgariff,et al. Fermi GF100 GPU Architecture , 2011, IEEE Micro.
[35] Luis Ceze,et al. Alembic: automatic locality extraction via migration , 2014, OOPSLA.
[36] Keshav Pingali,et al. Synthesizing parallel graph programs via automated planning , 2015, PLDI.
[37] Rachid Guerraoui,et al. Preventing versus curing: avoiding conflicts in transactional memories , 2009, PODC '09.
[38] Guy E. Blelloch,et al. Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.
[39] Wei Liu,et al. Thread-Level Speculation on a CMP can be energy efficient , 2005, ICS '05.
[40] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[41] Keshav Pingali,et al. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms , 2011, PPoPP '11.
[42] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[43] David A. Wood,et al. LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[44] David R. Jefferson,et al. Virtual time , 1985, ICPP.
[45] D. J. A. Welsh,et al. An upper bound for the chromatic number of a graph and its application to timetabling problems , 1967, Comput. J..
[46] Guy E. Blelloch,et al. Experimental Analysis of Space-Bounded Schedulers , 2016, ACM Trans. Parallel Comput..
[47] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[48] Josep Torrellas,et al. ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[49] Pat Hanrahan,et al. GRAMPS: A programming model for graphics pipelines , 2009, ACM Trans. Graph..
[50] Joel Emer,et al. Unlocking Ordered Parallelism with the Swarm Architecture , 2016, IEEE Micro.
[51] Binoy Ravindran,et al. HyFlow: a high performance distributed software transactional memory framework , 2011, HPDC '11.
[52] Todd C. Mowry,et al. The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[53] J. Gregory Steffan,et al. Improving cache locality for thread-level speculation , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[54] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[55] Keshav Pingali,et al. Scheduling strategies for optimistic parallel execution of irregular programs , 2008, SPAA '08.
[56] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..
[57] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[58] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[59] Christoforos E. Kozyrakis,et al. Flexible architectural support for fine-grain scheduling , 2010, ASPLOS XV.
[60] Mikel Luján,et al. Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering , 2008, HiPEAC.
[61] Charles E. Leiserson,et al. Ordering heuristics for parallel graph coloring , 2014, SPAA.
[62] Charles E. Leiserson,et al. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.
[63] Wei Liu,et al. Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.
[64] Benoît Dupont de Dinechin,et al. A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[65] David A. Wood,et al. LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[66] Emmett Witchel,et al. Is transactional programming actually easier? , 2010, PPoPP '10.
[67] Hsien-Hsin S. Lee,et al. Adaptive transaction scheduling for transactional memory systems , 2008, SPAA '08.
[68] Carlo Curino,et al. Schism , 2010, Proc. VLDB Endow..
[69] Josep Torrellas,et al. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors , 2005, TACO.
[70] Kunle Olukotun,et al. Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.
[71] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[72] Tim Weninger,et al. Thinking Like a Vertex , 2015, ACM Comput. Surv..
[73] T. N. Vijaykumar,et al. Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies , 2013, ASPLOS '13.