SCD: A scalable coherence directory with flexible sharer set encoding
暂无分享,去创建一个
[1] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[2] Torvald Riegel,et al. Optimizing hybrid transactional memory: the importance of nonspeculative operations , 2011, SPAA '11.
[3] Sanjay J. Patel,et al. WAYPOINT: scaling coherence to thousand-core architectures , 2010, PACT '10.
[4] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[5] Rahul Khanna,et al. RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).
[6] Natalie D. Enright Jerger,et al. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[7] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[8] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[10] Rasmus Pagh,et al. Cuckoo Hashing , 2001, Encyclopedia of Algorithms.
[11] Tony Tung,et al. Scaling Memcache at Facebook , 2013, NSDI.
[12] Nir Shavit,et al. Transactional Locking II , 2006, DISC.
[13] Babak Falsafi,et al. Cuckoo directory: A scalable directory for many-core systems , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[14] Belliappa Kuttanna,et al. A Sub-1W to 2W Low-Power IA Processor for Mobile Internet Devices and Ultra-Mobile PCs in 45nm Hi-Κ Metal Gate CMOS , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[15] Victor Pankratius,et al. A study of transactional memory vs. locks in practice , 2011, SPAA '11.
[16] Tudor David,et al. Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.
[17] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[18] Andreas Moshovos,et al. A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[19] Dong-Sheng Wang,et al. Hierarchical Cache Directory for CMP , 2010, Journal of Computer Science and Technology.
[20] Nir Shavit,et al. Reduced hardware transactions: a new approach to hybrid transactional memory , 2013, SPAA.
[21] Rodolfo Azevedo,et al. Characterizing the Energy Consumption of Software Transactional Memory , 2009, IEEE Computer Architecture Letters.
[22] Sanjay J. Patel,et al. WayPoint: Scaling coherence to 1000-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[23] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[24] Sean White,et al. Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory , 2011, ASPLOS XVI.
[25] Yujie Liu,et al. Transactionalizing legacy code: an experience report using GCC and Memcached , 2014, ASPLOS.
[26] Roberto Palmieri,et al. On the analytical modeling of concurrency control algorithms for Software Transactional Memories: The case of Commit-Time-Locking , 2012, Perform. Evaluation.
[27] Nir Shavit,et al. Software transactional memory , 1995, PODC '95.
[28] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[30] Maged M. Michael,et al. Robust architectural support for transactional memory in the power architecture , 2013, ISCA.
[31] Christoforos E. Kozyrakis,et al. Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[32] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[33] Michael F. Spear,et al. NOrec: streamlining STM by abolishing ownership records , 2010, PPoPP '10.
[34] Mark Moir,et al. Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.
[35] Laxmi N. Bhuyan,et al. Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors , 1992, IEEE Trans. Parallel Distributed Syst..
[36] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[37] Maged M. Michael,et al. Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[38] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[39] João P. Cachopo,et al. Practical Parallel Nesting for Software Transactional Memory , 2013, DISC.
[40] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[41] Nuno Diegues,et al. Self-Tuning Intel Transactional Synchronization Extensions , 2014, ICAC.
[42] Christoforos E. Kozyrakis,et al. The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[43] Aamer Jaleel,et al. Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[44] Sandya Mannarswamy,et al. Compiler aided selective lock assignment for improving the performance of software transactional memory , 2010, PPoPP '10.
[45] Armin Heindl,et al. An analytic framework for performance modeling of software transactional memory , 2009, Comput. Networks.
[46] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[47] Bruno Ciciani,et al. Machine Learning-Based Self-Adjusting Concurrency in Software Transactional Memory Systems , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[48] Yu Yang,et al. Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols , 2010, Formal Methods Syst. Des..
[49] Torvald Riegel,et al. Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.
[50] Christoforos E. Kozyrakis,et al. Scalable and Efficient Fine-Grained Cache Partitioning with Vantage , 2012, IEEE Micro.
[51] Maurice Herlihy,et al. Embedded-TM: Energy and complexity-effective hardware transactional memory for embedded multicore systems , 2010, J. Parallel Distributed Comput..
[52] Vijayalakshmi Srinivasan,et al. A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[53] Larry Carter,et al. Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..
[54] Sandhya Dwarkadas,et al. SPACE: Sharing pattern-based directory coherence for multicore scalability , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[55] Massimo Poncino,et al. Energy-optimal synchronization primitives for single-chip multi-processors , 2009, GLSVLSI '09.
[56] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[57] Nir Shavit,et al. Software transactional memory , 1995, PODC '95.
[58] Hermann Härtig,et al. Measuring energy consumption for short code paths using RAPL , 2012, PERV.
[59] R. Govindarajan,et al. Emulating Optimal Replacement with a Shepherd Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[60] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.
[61] Deborah A. Wallach. PHD: A Hierarchical Cache Coherent Protocol , 1992 .
[62] Yehuda Afek,et al. Programming with hardware lock elision , 2013, PPoPP '13.
[63] Yu Yang,et al. Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee , 2006, 2006 Formal Methods in Computer Aided Design.
[64] Wolfgang E. Nagel,et al. Power measurement techniques on standard compute nodes: A quantitative comparison , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[65] Torvald Riegel,et al. Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack , 2010, EuroSys '10.
[66] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[67] George Kurian,et al. ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[68] Mikko H. Lipasti,et al. Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[69] Rachid Guerraoui,et al. Stretching transactional memory , 2009, PLDI '09.
[70] Nuno Diegues,et al. Time-warp: lightweight abort minimization in transactional memory , 2014, PPoPP '14.
[71] Ha Pham,et al. A 40nm 16-core 128-thread CMT SPARC SoC processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[72] Manuel E. Acacio,et al. On the design of energy‐efficient hardware transactional memory systems , 2013, Concurr. Comput. Pract. Exp..
[73] Michael Mitzenmacher,et al. More Robust Hashing: Cuckoo Hashing with a Stash , 2008, ESA.
[74] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[75] Shankar Balachandran,et al. The Implications of Shared Data Synchronization Techniques on Multi-Core Energy Efficiency , 2012, HotPower.