Evaluating cache coherent shared virtual memory for heterogeneous multicore chips
暂无分享,去创建一个
[1] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[2] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[3] Patricia J. Teller. Translation-lookaside buffer consistency , 1990, Computer.
[4] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[5] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[6] Kevin M. Lepak,et al. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.
[7] Alan Jay Smith,et al. A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.
[8] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[9] Wu-chun Feng,et al. On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[10] Babak Falsafi,et al. Cuckoo directory: A scalable directory for many-core systems , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[11] Pat Conway,et al. The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.
[12] Milind Girkar,et al. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.
[13] Hong Jiang,et al. Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[14] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[15] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[16] Daniel J. Sorin,et al. UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[17] Brad Burgess,et al. Bobcat: AMD's Low-Power x86 Processor , 2011, IEEE Micro.
[18] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[19] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.
[20] Milo M. K. Martin,et al. Why on-chip cache coherence is here to stay , 2012, Commun. ACM.
[21] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[22] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[23] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[24] H. Franke,et al. Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..
[25] Eric M. Schwarz,et al. IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..
[26] Maurice Steinman,et al. AMD'S "LLANO" Fusion APU , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[27] Jungwon Kim,et al. COMIC++: A software SVM system for heterogeneous multicore accelerator clusters , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[28] Ronak Singhal,et al. Inside Intel® Core microarchitecture (Nehalem) , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[29] Daniel J. Sorin,et al. Exploring memory consistency for massively-threaded throughput-oriented processors , 2013, ISCA.
[30] Mike O'Connor,et al. Cache coherence for GPU architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[31] Keshav Pingali,et al. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm , 2011 .
[32] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[33] Charlie Johnson,et al. IBM Power Edge of Network Processor: A Wire-Speed System on a Chip , 2011, IEEE Micro.
[34] R. J. Joenk,et al. IBM journal of research and development: information for authors , 1978 .
[35] Sanjay J. Patel,et al. Cohesion: a hybrid memory model for accelerators , 2010, ISCA.
[36] Peter Sewell,et al. A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.
[37] N. Gura,et al. UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.
[38] Marcelo Yuffe,et al. A fully integrated multi-CPU, GPU and memory controller 32nm processor , 2011, 2011 IEEE International Solid-State Circuits Conference.