Efficient Hardware-Supported Synchronization Mechanisms for Manycores
暂无分享,去创建一个
[1] K.L. Shepard,et al. Distributed Loss-Compensation Techniques for Energy-Efficient Low-Latency On-Chip Communication , 2007, IEEE Journal of Solid-State Circuits.
[2] M. Erez,et al. Express Virtual Channels with Capacitively Driven Global Links , 2009, IEEE Micro.
[3] S. Wong,et al. Near speed-of-light signaling over on-chip electrical interconnects , 2003 .
[4] K. Okada,et al. A Bidirectional- and Multi-Drop-Transmission-Line Interconnect for Multipoint-to-Multipoint On-Chip Communications , 2008, IEEE Journal of Solid-State Circuits.
[5] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[6] John B. Carter,et al. MP-LOCKs: replacing H/W synchronization primitives with message passing , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[7] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[8] Nathan R. Tallent,et al. Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.
[9] Manuel E. Acacio,et al. Sim-PowerCMP: A Detailed Simulator for Energy Consumption Analysis in Future Embedded CMP Architectures , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).
[10] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[11] Thomas E. Anderson,et al. The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors , 1989, ICPP.
[12] Thomas E. Anderson,et al. The performance implications of thread management alternatives for shared-memory multiprocessors , 1989, SIGMETRICS '89.
[13] James R. Goodman,et al. Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.
[14] Pen-Chung Yew,et al. An effective synchronization network for hot-spot accesses , 1992, TOCS.
[15] Gaël Thomas,et al. Efficient locking for multicore architectures , 2011 .
[16] John Sartori,et al. Low-Overhead, High-Speed Multi-core Barrier Synchronization , 2010, HiPEAC.
[17] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[18] D. Burger,et al. Efficient Synchronization: Let Them Eat QOLB /sup1/ , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[19] Norman P. Jouppi,et al. Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[20] Gerard V. Kopcsay,et al. Packaging the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..
[21] Sunil D. Sherlekar. Intel Many Integrated Core (MIC) Architecture. , 2012, ICPADS 2012.
[22] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[23] Bradford M. Beckmann,et al. TLC: transmission line caches , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[24] Justin Schauer,et al. High Speed and Low Energy Capacitively Driven On-Chip Wires , 2008, IEEE J. Solid State Circuits.
[25] Frank Mueller,et al. Token-Based Read/Write-Locks for Distributed Mutual Exclusion , 2000, Euro-Par.
[26] Eisse Mensink,et al. A 0.28pJ/b 2Gb/s/ch Transceiver in 90nm CMOS for 10mm On-Chip interconnects , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[27] Mary K. Vernon,et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.
[28] Milos Prvulovic,et al. TLSync: Support for multiple fast barriers using on-chip transmission lines , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[29] Gianluca Palermo,et al. An efficient synchronization technique for multiprocessor systems on-chip , 2006, SIGARCH Comput. Archit. News.
[30] Beng-Hong Lim,et al. Reactive synchronization algorithms for multiprocessors , 1994, ASPLOS VI.
[31] Guang R. Gao,et al. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences , 2006, Euro-Par.
[32] Luca Benini,et al. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[33] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[34] Edsger W. Dijkstra,et al. Solution of a problem in concurrent programming control , 1965, CACM.
[35] Anant Agarwal,et al. Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling , 2009 .
[36] José E. Moreira,et al. Evaluation of a multithreaded architecture for cellular computing , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[37] Anant Agarwal,et al. Smartlocks: lock acquisition scheduling for self-aware synchronization , 2010, ICAC '10.
[38] William N. Scherer,et al. Scalable queue-based spin locks with timeout , 2001, PPoPP '01.
[39] Sunil Sherlekar. Tutorial: Intel many integrated core (MIC) architecture , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[40] W. Daniel Hillis,et al. The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.
[41] Richard McDougall,et al. Solaris internals : core kernel components , 2001 .